Sie sind auf Seite 1von 15

|   |

 

2


 
CONTENT
‡ Process of  
‡ rchitecture of a data mining s stem
‡ ifferent levels of knowledge
‡ Functionalities of data mining
‡ Multiple discipline of data mining
‡ Major issues in data mining
‡ pplication
mNTRODUCTmON
‡ ata mining is for discovering interesting patterns
from large amount of data
‡ Process of finding correlations or patterns among
dozens of fields in large relational databases
‡   process includes data cleansing data
integration data selection transformation pattern
evaluation and knowledge presentation
TO KNOW
ata Warehousing:
‡ Process of centralized data management and retrievel
‡ t is a relational database management s stem
( M  to meet the needs of transaction processing
s stem
ata mining:
‡ Provides a wa to get at the information buried in the
data
KDD Process
 ² nowledge iscover in atabases
‡
tracting of interesting information or patterns
from data in large databases
‡ Finding the right method to do data mining
‡ t·s of interest to researchers in machine learning
pattern recognition databases statistics artificial
intelligence and data visualization
uTEPu
‡ eveloping and understanding of the goals of the end user
‡ electing a data set or focusing on a subset of variablesor
data samples
‡ ata cleaning and preprocessing
removal of noise
strategies for handling missing data fields
‡ ata reduction and projection
dimensionalit reduction or transformation methods
‡Choosing the data mining task
‡Choosing the data mining algorithm(s
 earching for patterns
 eciding which models and parameters
ma be appropriate
matching a particular data mining
method
‡ ata mining
‡nterpreting mined process
‡Consolidating discovered knowledge
chematic representation of  process
Data mining functionalities (1)
ssociation rule mining:
classification and prediction
finding models that describe and distinguish
classes
eg classif countries based on climate
Presentation: decisiontree clasification rule neural
network
Prediction: Predict some unknown or missing numerical
values
Data mining functionalities (2)
Cluster anal sis:
Class label is unknown
Clustering based on the principle
Methods:
Partitioning: kmeans kmedoids CL
ierarchical: 2C CU

ensit based: 2 C  CLU


 OPC
ridbased:   WaveCluster
Modelbased: utoclassdencluecobweb
Data mining functionalities (3)
Outlier anal sis
 data object that does not compl with the
general behavior of the data
t can be considered as noise but is quite useful in
fraud detectionrare event anal sis
rend and evolution anal sis
rend and deviation: regression anal sis
equential pattern mining periodicit anal sis
imilarit based anal sis
Vajor issues in data mining

‡ Mining methodolog and user interaction


‡ Performance and scalabilit
‡ ssues relating to the diversit of data t pes
‡ ssues relete to applications and social impacts
Y lications
atabase anal sis and decision support
Market anal sis and management
isk anal sis and management
Fraud detection and management
Other applications
et mining and web anal sis
ntelligent quer answering
 ports
stronom 
Š