Report Data Mining

Name :Samah Ziad Abudaia
Student ID:220133612
Introduction: The report shows the mechanics of data

mining on a large database and reduce the size of the
data and find useful relationships we bring data and
preparation data and the use of methods for extracting
data will remember the steps in detail and explain the
results
The first step: the process of bringing data
From site https://archive.ics.uci.edu/ml/datasets.html
Explanation of data: data that has been extracted were
talking about a database of diabetes where they used
algorithm literature to predict the beginning of diabetes
in pregnant women and the results turned to the
variable binary 0 or 1 and are heading 1 is a positive
test and Depends this examination on several variables
1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral
glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
9. Class variable (0 or 1)
1-Taking the data and put it in a file and call in the
program to executed their operations
1
2-After the data insertion in the program And the

designation of the columns and reservation the process
then pull data to implementation square
Run the program shows the following table, which -3

contains data
Step Two: Prepare the data in order to become ready

for use and the use of three methods
2
The first process Remove duplication

Look in a box operations Filtering Then we choose -1
to Remove duplication and implementation of the
process and see the results
After running the program: does not have a repeat -2

of the data
The Second process
Look in a box operations Data Transformation Then -1

data cleansing we choose to Replace missing value and
implementation of the process and see the results
After running the program: does not have-2

the missing values
The third process outliers

Look in a box operations Data-1
Transformation Then data cleansing we
choose to Detect outliers (Distances)and
implementation of the process and see the
results
After running the program: does not have the -2

missing values
After these three processes for data processing data are

ready and correct for the application Association Rules,
.two classification methods, clustering outlier
We will apply the Association Rules existing data
Process: We withdraw the data set to square work and

then look for Data Transformation then Type
Conversion and we withdraw Numerical to Binominal
and Numerical to Numerical square to work and
connected with the data set and Look for Modeling
then choose Association and Item Set mining and
choose FP-Growth and withdraw to square
implementation
Explains the Association Rules of the special

relationship between the characteristics
Result : This table shows the Special relationships

between variables
This chart to Association
:
Rules
We will apply the classification existing data

Process: pull the data set to square working and -1
looking for Numerical to Binominal then define the
label from the list look for Data transformation then
withdraw name and role modification then set role on
right of the page there are properties in which we
define label
We create a splitter to split the data into testing and -2

training data
Looking for a list of training and evaluation, and
validation and then the spilt validation Tow click on
validation appear divided into two parts, the first
section screen training looking for modeling and
classification and regression induction tree and then
withdraw the decision tree
8
The second section is the test we put the search for

model application and then the confidences and pull
apply model for
Looking for a performance evaluation of the list and
then the validation of performance and regression then
the (performance classification) to measure the
accuracy of the data finally make running to the
process
Result : The new data classification based on old

data and measure the accuracy and classification
analyze the input data and to develop an accurate
description or model for each class using the features
present in the data
9
After conducting this process was accurate

measurement of old and new data and taking average
and equal81 %
Chart:
Naive Bayes: The second method of

classification are the same steps.
10
Clustering : pull the data set to square working

and looking for Numerical to Binominal and Remove
Duplicate then define the label from the list look for
Data transformation then withdraw name and role
modification then set role on right of the page there are
properties in which we define label
Looking for a performance evaluation of the list and
then the validation of performance and regression then
the (performance clustering)Looking for a list of
Modeling then cluster and segmentation and withdraw
11
the K-means then search for data transformation then

attribute set reduction and transformation then
transformation then singular values composite
12
Result: split the data into tow clustering
The last process: Outliers

pull the data set to square working and look
for Data transformation then Data Cleansing
then outliers Detect and withdraw Detect
outliers(Distances) Find SVD and also
withdrawn.
13
Result : The data were classified to the

Outliers = True
And not outliers=false
This image represents statistical outliers

14
There are only ten outliers.
Outliers=10
Not Outliers=758
This chart: shows the percentage of nonoutliers and outliers values Fallon Red represents
a few outliers either blue color represents a nonoutliers values, the largest percentage
15
Conclusion: been identified in this report on

how to attract data and operations on data processing
and make it usable for the application of data mining
techniques, including the Association Rules ,outliers,
clustering and classification.
Each one of them has a different mechanism and
different result identify the existing data set we have
large they need these easy ways and identify Statistics.
16

Report Data Mining

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Report Data Mining

Hochgeladen von

Copyright:

Verfügbare Formate

Name :Samah Ziad Abudaia

Introduction: The report shows the mechanics of data

2-After the data insertion in the program And the

Run the program shows the following table, which -3

Step Two: Prepare the data in order to become ready

The first process Remove duplication

After running the program: does not have a repeat -2

The Second process

Look in a box operations Data Transformation Then -1

After running the program: does not have-2

The third process outliers

After running the program: does not have the -2

After these three processes for data processing data are

Process: We withdraw the data set to square work and

Explains the Association Rules of the special

Result : This table shows the Special relationships

This chart to Association

We will apply the classification existing data

We create a splitter to split the data into testing and -2

The second section is the test we put the search for

Result : The new data classification based on old

After conducting this process was accurate

Naive Bayes: The second method of

Clustering : pull the data set to square working

the K-means then search for data transformation then

Result: split the data into tow clustering

The last process: Outliers

Result : The data were classified to the

This image represents statistical outliers

There are only ten outliers.

Conclusion: been identified in this report on

Das könnte Ihnen auch gefallen