Sie sind auf Seite 1von 17

By Ngatunga Elizabeth

Roll No: 13PTIT011


1

Msc. IT and Management


Bussiness Intelligence and Analytics
2013 Batch

12/19/15

STATISTICAL BASED
METHOD DATA MINING
ALGORITHM

OUTLINE
Introduction

Correlation Analysis

Regression Analysis

Bayesian model

Conclusion

References

12/19/15

INTRODUCTION

Data Mining => searching for certain patterns of


12/19/15

data so as to obtain knowledge that can be used for


decision making

Statistics is useful for mining various patterns


from data as well as for understanding the
underlying

mechanisms

generating

and

affecting the patterns

Both statistics and data mining are concerned


with drawing inferences from data

STATISTICAL BASED ALGORITHMS


Correlation Analysis

Regression Analysis

Bayesian model

12/19/15

CORRELATION
Correlation is a statistical technique used to
determine the degree to which two variables are
related expressed as correlation coefficient, r

12/19/15

CORRELATION USING LIFT

Correlation rule being measured not only by its support and


confidence but also by the correlation between item sets A and
B

12/19/15

Expressed as

Such that lift(A,B)


-1
|
negatively correlated

0
|
independent

1
|
positively correlated

CORRELATION USING LIFT


12/19/15

lift of the association (or correlation) rule


assesses the degree to which the occurrence of
one lifts the occurrence of the other
7

CORRELATION USING CHI-SQUARE


METHOD

12/19/15

Applicable in categorical (binary) data e.g the


customer loyalty to a supermarket.
The 2 statistic tests the hypothesis that A and B
are independent, that is there is no correlation
between them.
If the hypothesis can be rejected, then A and B are
statistically correlated.
8

APPLICATION IN DATA MINING

12/19/15

The discovery of interesting correlation relationships


among huge amounts of business transaction records
can help in many business decision-making processes
such as
Catalog

design,
Cross-marketing (joint promotion)
Customer shopping behavior analysis.

REGRESSION

12/19/15

Regression analysis can be used to model the


relationship between one or more independent or
predictor variables and a dependent or response
variable (which is continuous-valued).
Types
Linear

Regression (single independent variable)

Multiple
Logit

regression (more than one)

Regression (categorical dependent variable)


10

APPLICATION IN DATA MINING

Commerce: predicting sales amounts of new product

12/19/15

Applications of this statistical method in data mining are


multiple which includes
based on advertising expenditure

Meteorology: predicting wind velocities and directions


as a function of temperature, humidity, air pressure

Stock exchange: time series prediction of stock market


indices (trend estimation)

Medicine: effect of parental birth weight/height on


infant birth weight/height, for instance (Glorunescu,
2011)

11

BAYESIAN MODEL
12/19/15

12

BAYESIAN MODEL
12/19/15

13

APPLICATION IN DATA MINING

Minimizes the probability to make a wrong


decision, or the expected risk

12/19/15

Thus, the experts will be able to give an


estimate of the weight or importance of their
prior knowledge, compared to the training
data available

14

CONCLUSION

Industries such as banking, insurance,


medicine, and retailing commonly use data
mining to reduce costs, enhance research, and
increase sales.
Through the use of Correlation, Regression
and Bayesian model inferences are made to
ensure understanding of the patterns of
correlation and causal links among the data
values or making predictions of future data
values.
15

REFERENCES

Hand, Mannila and Smyth (2001), Principles of Data


Mining, The MIT Press ,Cambridge, Massachusetts London
England

12/19/15

Han and Kamber(2012), Data Mining: Concepts and


Techniques,3rd edition, Morgan Kaufmann, USA.

Glorunescu(2011), Data Mining:Concepts, models and


Techniques, Springer-Verlag Berlin Heidelberg , Romania
Berry and Linolf(2004), Data MiningTechniques:For
Marketing, Sales, andCustomer Relationship Management,
2nd Edition, Wiley Publishing, Inc., Indianapolis, Indiana,
Robinson and Officer (2008),Data Mining: Predicting
16
Laptop Retail Price Using Regression, Acccesed on 8/3/2015
http://www.spelman.edu/docs/aspire-research/joibritneypdf.pdf?s

END OF PRESENTATION
12/19/15

17