You are on page 1of 17

Student ID:220133612

edu/ml/datasets. Triceps skin fold thickness (mm) 5. Age (years) 9. 2-Hour serum insulin (mu U/ml) 6. Plasma glucose concentration a 2 hours in an oral glucose tolerance test 3.ics.uci. Body mass index (weight in kg/(height in m)^2) 7. Diabetes pedigree function 8. Diastolic blood pressure (mm Hg) 4.Introduction: The report shows the mechanics of data mining on a large database and reduce the size of the data and find useful relationships we bring data and preparation data and the use of methods for extracting data will remember the steps in detail and explain the results The first step: the process of bringing data From site https://archive. Number of times pregnant 2.html Explanation of data: data that has been extracted were talking about a database of diabetes where they used algorithm literature to predict the beginning of diabetes in pregnant women and the results turned to the variable binary 0 or 1 and are heading 1 is a positive test and Depends this examination on several variables 1. Class variable (0 or 1) 1-Taking the data and put it in a file and call in the program to executed their operations 1 .

2-After the data insertion in the program And the designation of the columns and reservation the process then pull data to implementation square Run the program shows the following table. which -3 contains data Step Two: Prepare the data in order to become ready for use and the use of three methods 2 .

The first process Remove duplication Look in a box operations Filtering Then we choose -1 to Remove duplication and implementation of the process and see the results After running the program: does not have a repeat -2 of the data The Second process 3 .

Look in a box operations Data Transformation Then -1 data cleansing we choose to Replace missing value and implementation of the process and see the results After running the program: does not have-2 the missing values 4 .

The third process outliers Look in a box operations Data-1 Transformation Then data cleansing we choose to Detect outliers (Distances)and implementation of the process and see the results After running the program: does not have the -2 missing values 5 .

. clustering outlier We will apply the Association Rules existing data Process: We withdraw the data set to square work and then look for Data Transformation then Type Conversion and we withdraw Numerical to Binominal and Numerical to Numerical square to work and connected with the data set and Look for Modeling then choose Association and Item Set mining and choose FP-Growth and withdraw to square implementation Explains the Association Rules of the special relationship between the characteristics 6 .two classification methods.After these three processes for data processing data are ready and correct for the application Association Rules.

Result : This table shows the Special relationships between variables This chart to Association : Rules 7 .

and validation and then the spilt validation Tow click on validation appear divided into two parts. the first section screen training looking for modeling and classification and regression induction tree and then withdraw the decision tree 8 .We will apply the classification existing data Process: pull the data set to square working and -1 looking for Numerical to Binominal then define the label from the list look for Data transformation then withdraw name and role modification then set role on right of the page there are properties in which we define label We create a splitter to split the data into testing and -2 training data Looking for a list of training and evaluation.

The second section is the test we put the search for model application and then the confidences and pull apply model for Looking for a performance evaluation of the list and then the validation of performance and regression then the (performance classification) to measure the accuracy of the data finally make running to the process Result : The new data classification based on old data and measure the accuracy and classification analyze the input data and to develop an accurate description or model for each class using the features present in the data 9 .

After conducting this process was accurate measurement of old and new data and taking average and equal81 % Chart: Naive Bayes: The second method of classification are the same steps. 10 .

Clustering : pull the data set to square working and looking for Numerical to Binominal and Remove Duplicate then define the label from the list look for Data transformation then withdraw name and role modification then set role on right of the page there are properties in which we define label Looking for a performance evaluation of the list and then the validation of performance and regression then the (performance clustering)Looking for a list of Modeling then cluster and segmentation and withdraw 11 .

the K-means then search for data transformation then attribute set reduction and transformation then transformation then singular values composite 12 .

Result: split the data into tow clustering The last process: Outliers pull the data set to square working and look for Data transformation then Data Cleansing then outliers Detect and withdraw Detect outliers(Distances) Find SVD and also withdrawn. 13 .

Result : The data were classified to the Outliers = True And not outliers=false This image represents statistical outliers 14 .

Outliers=10 Not Outliers=758 This chart: shows the percentage of nonoutliers and outliers values Fallon Red represents a few outliers either blue color represents a nonoutliers values.There are only ten outliers. the largest percentage 15 .

clustering and classification.outliers.Conclusion: been identified in this report on how to attract data and operations on data processing and make it usable for the application of data mining techniques. Each one of them has a different mechanism and different result identify the existing data set we have large they need these easy ways and identify Statistics. including the Association Rules . 16 .