You are on page 1of 7

Unit eight

Classification maps each data elements to one of a set of pre-determined classes

based on the difference among data elements belonging to different classes.
Clustering groups data elements into different groups based on the similarity
between elements with a single group.

 Classification and prediction are two forms of data analysis that can be used to
extract models describing important data classes or to predict future data
 Such analysis can help provide us with a better understanding of the data at
 Whereas classification predicts categorical (discrete, unordered) labels,
prediction model continuous valued functions.
 Many classification and prediction methods have been proposed by researchers
in machine learning, pattern recognition, and statistics.

Issue Regarding Classification and Prediction

 Preparing the Data for Classification and Prediction
 Comparing Classification and Prediction Methods

Preparing the Data for Classification and Prediction

The following preprocessing steps may be applied to the data to help improve the
accuracy, efficiency, and scalability of the classification or prediction process:
 Data Cleaning
This refers to the preprocessing of data in order to remove or reduce noise and the
treatment of missing values.
 Relevance analysis
Many of the attributes in the data may be redundant. Correlated analysis can be used
to identify whether any two given attributes are statistically related.
 Data Transformation and reduction
The data may be transformed by normalization, particularly when neural networks or
methods involving distance measurements are used in the learning steps.

Comparing Classification and Prediction Methods

Classification and prediction methods can be compared and evaluated according to
the following criteria:
 Accuracy
This refers to the ability of a given classifier to correctly predict the class label of new
or previously unseen data (i.e., tuples without class label information)
 Speed
This refers to the computational costs involved in generating and using the given
classifier or predictor.
 Robustness
This refers to ability of the classifier or predictor to make correct predictions given
noisy data or data with missing values
 Scalability
This refers to the ability to construct the classifier or predictor efficiently given large
amounts of data
 Interpretability
This refers to the level of understanding and insight that is provided by the classifier
or predictor.

Page 1 Unit 8
Classification Techniques
Decision Tree Identification

Classification Problem
Weather  Play( Yes, No)

Hunt’s Method for decision tree identification:

Given N element types and m decision classes:
1. For i 1 to N do

i. Add element I to the i-1 element item-sets from the previous iteration

ii. Identify the set of decision classes for each item-set

iii. If an item-set has only one decision class, then that item-set is done, remove
that item-set from subsequent iterations.

.2 done

Classification Techniques
Decision Tree Identification

Sunny  Yes
Cloudy  Yes/No
Overcast  Yes/No

Page 2 Unit 8
Decision Tree Identification

Cloudy  Yes
Cloudy No
Cloudy Yes

Decision Tree Identification

Overcast No
Overcast Yes

Page 3 Unit 8
Decision Tree Identification Example:
 Top down technique for decision tree identification
 Decision tree created is sensitive to the order in which items are considered
 If an N-item-set does not result in a clear decision.
Classification classes have to be modeled by rough sets.

Clustering Techniques
 Clustering partitions the data set into clusters or equivalence classes.
 Similarity among members of a class more than similarity among members
across classes.
Similarity measures: Euclidian distance or other application specific measures.

Clustering Techniques
Nearest Neighbour Clustering Algorithm:
Given n elements X1, X2, …. Xn, and threshold t, .
1. j  1, k  1, cluster = { }
2. Repeat
I. Find the nearest neighbour of xj
II. Let the nearest neighbour be in cluster m
III. If distance to nearest neighbour >t, then create a new cluster and
k k+1; else assign xj to cluster m
IV. j j+1
3. until j>n

Iterative partitional Clustering

Given n element x1, x2, . . . Xn, and k clusters, each with a center.
1. Assign each element to its closest cluster center
2. After all assignments have been made, compute the cluster centroids for each
of the cluster
3. Repeat the above two steps with the new centroids until the algorithm

Numeric prediction is the task of predicting continuous (or ordered) values for given
For example:
We may wish to predict the salary of college graduates with 10 years of work
experience, or the potential sales of a new product given its price.
The mostly used approach for numeric prediction is regression
A statistical methodology that was developed by Sir Frances Galton (1822-1911), a
mathematician who was also a cousin of Charles Darwin

In many texts use the terms “regression” and “numeric prediction” synonymously
Regression analysis can be used to model the relationship between one or more
independent or predictor variables and a dependent or response variable (which is
continuous value)
In the context of data mining, the predictor variables are the attributes of interest
describing the tuple
The response variable is what we want to predict

Page 4 Unit 8
Types of Regression
The types of Regression are as:

 Linear Regression
 NonLinear Regression

Linear Regression
Straight-line regression analysis involves a response variable, y, and a single
predictor variable, x.
It is the simplest form of regression, and models y as a linear function of x.
That is,
Where the variance of y is assumed to be constant, and b and w are regression
coefficients specifying the Y-intercept and slope of the line, respectively.

The regression coefficient, w and b, can also be thought of as weight, so that we can
equivalent write,
The regression coefficient can be estimated using this method with the following
[Refer to write board:]
Example Too:

Multiple Linear Regression

The multiple linear regression is an extension of straight-line regression so as to
involve more than one predictor variable.
An example of a multiple linear regression model based on two predictor attributes or
variables, A1 and A2, is

Where x1 and x2 are the values of attributes A1 and A2, respectively, in X.

Multiple regression problems are instead commonly solved with the use of statistical
software packages, such as SPSS(Statistical Package for the Social Sciences),

NonLinear Regression
The straight-line linear regression case where dependent response variable, y, is
modeled as a linear function of a single independent predictor variable, x.
If we can get more accurate model using a nonlinear model, such as a parabola or
some other higher-order polynomial?
Polynomial regression is often of interest when there is just one predictor variable.
Consider a cubic polynomial relationship given by

NonLinear Regression
In statistics, nonlinear regression is a form of regression analysis in which
observational data are modeled by a function which is a nonlinear combination of the
model parameters and depends on one or more independent variables. The data are
fitted by a method of successive approximations.

Page 5 Unit 8
The process of grouping a set of physical or abstract objects into classes of similar
objects is called clustering.
A cluster is a collection of data objects that are similar to one another within the same
cluster and are dissimilar to the objects in other clusters. A cluster of data objects can
be treated collectively as one group and so may be considered as a form of data
First the set is partitioned into groups based on data similarity (e.g., using
clustering), and then labels are assigned to the relatively small number of groups.
It is also called unsupervised learning. Unlike classification, clustering and
unsupervised learning do not rely on predefined classes and class-labeled training
examples. For this reason, clustering is a form of learning by observation, rather than
learning by examples.

Clustering is also called data segmentation in some applications because clustering
partitions large data sets into groups according to their similarity.
Clustering can also be used for outlier detection, where outliers (values that are “far
away” from any cluster) may be more interesting than common cases.

Advantages of such a clustering-based process:
 Adaptable to changes
 Helps single out useful features that distinguish different groups.

Applications of Clustering
 Market research
 Pattern recognition
 Data analysis
 Image processing
 Biology
 Geography
 Automobile insurance
 Outlier detection

K-Mean Algorithm
k: the number of clusters,
D: a data set containing n objects.
Output: A set of k clusters.
(1) arbitrarily choose k objects from D as the initial cluster centers;
(2) repeat
(3) (re)assign each object to the cluster to which the object is the most similar, based
on the mean value of the objects in the cluster;
(4) update the cluster means, i.e., calculate the mean value of the objects for each
(5) until no change;

Page 6 Unit 8
K-Medoids Algorithm
k: the number of clusters,
D: a data set containing n objects.
Output: A set of k clusters.
(1) arbitrarily choose k objects in D as the initial representative objects or seeds;
(2) repeat
(3) assign each remaining object to the cluster with the nearest representative
(4) randomly select a nonrepresentative object, orandom;
(5) compute the total cost, S, of swapping representative object, oj, with orandom;
(6) if S < 0 then swap oj with orandom to form the new set of k representative
(7) until no change;

Bayesian Classification
Bayesian Classification is based on Baye’s theorem.
Studies comparing classification algorithms have found a simple Bayesian classifier
known as the naïve Bayesian classifier to be comparable in performance with decision
tree and selected neural network classifiers.
Bayesian classifiers have also exhibited high accuracy and speed when applied to
large database.
Naïve Bayesian classifiers assume that the effect of an attribute value on a given
class is independent of the values of the other attributes. This assumption is called
class conditional independence.
Bayesian belief networks are graphical models, which unlike naïve Bayesian
classifiers, allow the representation of dependencies among subsets of attributes.

Bayes’ Theorem

Page 7 Unit 8