Sie sind auf Seite 1von 23

Mah-Rukh Fida June 2012

Topics to be discussed
DATA MINING
REGRESSION CLASSIFICATION

CLUSTERING

DATA MINING

Definition
Definition : Exploring hidden information
Models of data mining

Two categories of data mining models


Prediction Model

Makes prediction using known results found from different data objects.

Descriptive Model

Identifies patterns or relationships in data. Explores properties of the data examined Does not predict new properties.

REGRESSION

Definition
Numeric prediction of the value of dependent variable.
Relationship

between dependent and independent variable(s) are expressible through mathematical equation.

Types of regression

Types of Regression
Linear regression y=c+mx, where c and m are regression coefficients. Multi-Linear regression y=c0+c1x1+c2x2++cnxn where c0,c1,cn are regression coefficients and x1, x2,,xn are independent variables.

Basic Steps of Prediction


m= (270,000-180,000) / (33,000-17,000) = 90/16 =5.6 Now y = 100,000 + 5.6 x, put x=30,000, then y=100,000+ (5.6) (30,000) =268,000

Regression Continued
Regression model is selected when Prediction of a continuous or numerical value is needed The relationship of predictor and response can be expressed in the form of a curve or a mathematical equation Regression is not suitable when Data may not fit in linear model Linear data may be poor due to noise or outliers. Data is non-numeric

CLASSIFICATION

Definition
Predicts class membership of data instances Classes are non-overlapping

Classes are already defined

Basic Steps for Prediction


Model Construction Model Usage Example :

Height based Output follows the below given division criteria: 2m Height 1.7m < Height < 2m Height 1.7m Classify :<Pat, F, 1.6> using KNN with K=5. Tall Medium Short

- {<Kristina, F, 1.6>, <Kathy, F, 1.6>, < Stephanie, F, 1.7>, <Dave, M, 1.7>, <Wynette, F, 1.75>}. - Pat is Short.

Validation Criteria

Validation Criteria

CLUSTERING

Definition
Grouping of like terms
Groups are not predefined

Four Clusters

Clustering Algorithms

Four Basic Steps in Clustering Feature Selection

E.g. We have to make groups of students in a class, let the grouping is done on the basis of intelligence level of students

Similarity Measure
The intelligence level of students can be found by

taking a quiz. Marks obtained by students in the quiz are as follows Marks obtained by nine students: {2, 4, 10, 12, 3, 20, 30, 11, 25} The students who have little differences in the marks obtained should be grouped together.

Clustering Algorithm

Result Validation

If clusters do not make sense, go back to prior stage Check for tendency of clusters in the data set

Selection Criteria
Simplification
Useful in data concept construction Unsupervised learning

Validation Criteria
External criteria Entropy, F-Measure, NMI-Measure, Purity Internal criteria Sum of Squared Error, BIC, CH, DB, SIL, DUNN Relative criteria Entropy, SSE

END

Das könnte Ihnen auch gefallen