# By Ngatunga Elizabeth

## Msc. IT and Management

Bussiness Intelligence and Analytics
STATISTICAL BASED
METHOD DATA MINING
OUTLINE
Introduction

Correlation Analysis

Regression Analysis

Bayesian model

Conclusion

References

INTRODUCTION

decision making

## Statistics is useful for mining various patterns

from data as well as for understanding the
underlying

mechanisms

generating

and

## Both statistics and data mining are concerned

with drawing inferences from data

## STATISTICAL BASED ALGORITHMS

Correlation Analysis

Regression Analysis

Bayesian model

CORRELATION
Correlation is a statistical technique used to
determine the degree to which two variables are
related expressed as correlation coefficient, r

## Correlation rule being measured not only by its support and

confidence but also by the correlation between item sets A and
B

Expressed as

## Such that lift(A,B)

-1
|
negatively correlated

0
|
independent

1
|
positively correlated

## lift of the association (or correlation) rule

assesses the degree to which the occurrence of
one lifts the occurrence of the other
METHOD

## Applicable in categorical (binary) data e.g the

customer loyalty to a supermarket.
The 2 statistic tests the hypothesis that A and B
are independent, that is there is no correlation
between them.
If the hypothesis can be rejected, then A and B are
statistically correlated.
## The discovery of interesting correlation relationships

among huge amounts of business transaction records
can help in many business decision-making processes
such as
Catalog

design,
Cross-marketing (joint promotion)
Customer shopping behavior analysis.

REGRESSION

## Regression analysis can be used to model the

relationship between one or more independent or
predictor variables and a dependent or response
variable (which is continuous-valued).
Types
Linear

Multiple
Logit

## Applications of this statistical method in data mining are

multiple which includes

## Meteorology: predicting wind velocities and directions

as a function of temperature, humidity, air pressure

## Stock exchange: time series prediction of stock market

indices (trend estimation)

## Medicine: effect of parental birth weight/height on

infant birth weight/height, for instance (Glorunescu,
2011)

BAYESIAN MODEL
BAYESIAN MODEL
## Minimizes the probability to make a wrong

decision, or the expected risk

## Thus, the experts will be able to give an

estimate of the weight or importance of their
prior knowledge, compared to the training
data available

CONCLUSION

## Industries such as banking, insurance,

medicine, and retailing commonly use data
mining to reduce costs, enhance research, and
increase sales.
Through the use of Correlation, Regression
and Bayesian model inferences are made to
ensure understanding of the patterns of
correlation and causal links among the data
values or making predictions of future data
values.
END OF PRESENTATION
