Sie sind auf Seite 1von 17

# By Ngatunga Elizabeth

1

## Msc. IT and Management

Bussiness Intelligence and Analytics
2013 Batch

12/19/15

STATISTICAL BASED
METHOD DATA MINING
ALGORITHM

OUTLINE
Introduction

Correlation Analysis

Regression Analysis

Bayesian model

Conclusion

References

12/19/15

INTRODUCTION

12/19/15

decision making

## Statistics is useful for mining various patterns

from data as well as for understanding the
underlying

mechanisms

generating

and

## Both statistics and data mining are concerned

with drawing inferences from data

## STATISTICAL BASED ALGORITHMS

Correlation Analysis

Regression Analysis

Bayesian model

12/19/15

CORRELATION
Correlation is a statistical technique used to
determine the degree to which two variables are
related expressed as correlation coefficient, r

12/19/15

## Correlation rule being measured not only by its support and

confidence but also by the correlation between item sets A and
B

12/19/15

Expressed as

## Such that lift(A,B)

-1
|
negatively correlated

0
|
independent

1
|
positively correlated

12/19/15

## lift of the association (or correlation) rule

assesses the degree to which the occurrence of
one lifts the occurrence of the other
7

METHOD

12/19/15

## Applicable in categorical (binary) data e.g the

customer loyalty to a supermarket.
The 2 statistic tests the hypothesis that A and B
are independent, that is there is no correlation
between them.
If the hypothesis can be rejected, then A and B are
statistically correlated.
8

12/19/15

## The discovery of interesting correlation relationships

among huge amounts of business transaction records
can help in many business decision-making processes
such as
Catalog

design,
Cross-marketing (joint promotion)
Customer shopping behavior analysis.

REGRESSION

12/19/15

## Regression analysis can be used to model the

relationship between one or more independent or
predictor variables and a dependent or response
variable (which is continuous-valued).
Types
Linear

Multiple
Logit

10

12/19/15

## Applications of this statistical method in data mining are

multiple which includes

## Meteorology: predicting wind velocities and directions

as a function of temperature, humidity, air pressure

## Stock exchange: time series prediction of stock market

indices (trend estimation)

## Medicine: effect of parental birth weight/height on

infant birth weight/height, for instance (Glorunescu,
2011)

11

BAYESIAN MODEL
12/19/15

12

BAYESIAN MODEL
12/19/15

13

## Minimizes the probability to make a wrong

decision, or the expected risk

12/19/15

## Thus, the experts will be able to give an

estimate of the weight or importance of their
prior knowledge, compared to the training
data available

14

CONCLUSION

## Industries such as banking, insurance,

medicine, and retailing commonly use data
mining to reduce costs, enhance research, and
increase sales.
Through the use of Correlation, Regression
and Bayesian model inferences are made to
ensure understanding of the patterns of
correlation and causal links among the data
values or making predictions of future data
values.
15

REFERENCES

## Hand, Mannila and Smyth (2001), Principles of Data

Mining, The MIT Press ,Cambridge, Massachusetts London
England

12/19/15

## Han and Kamber(2012), Data Mining: Concepts and

Techniques,3rd edition, Morgan Kaufmann, USA.

## Glorunescu(2011), Data Mining:Concepts, models and

Techniques, Springer-Verlag Berlin Heidelberg , Romania
Berry and Linolf(2004), Data MiningTechniques:For
Marketing, Sales, andCustomer Relationship Management,
2nd Edition, Wiley Publishing, Inc., Indianapolis, Indiana,
Robinson and Officer (2008),Data Mining: Predicting
16
Laptop Retail Price Using Regression, Acccesed on 8/3/2015
http://www.spelman.edu/docs/aspire-research/joibritneypdf.pdf?s

END OF PRESENTATION
12/19/15

17