Sie sind auf Seite 1von 33

Data Mining

Presented by
Selima wolela
Date 19/06/08

Book
Data Mining
Concept, models and Technique
Outline

Introduction to Data Mining .


The Data-Mine
Exploratory Data Analysis
Classification and Decision Trees
Data Mining Techniques and Models
Introduction to Data mine

What are data information and knowledge ??


Definition, of Data Mining?
What Is and What Is Not Data Mining?.
Why Data Mining? .
What are data information and
knowledge ???
The word data is the Latin plural of datum,
coming from the verb dare to give. Its a Facts, a
description of the World-
If I take a picture of you, the photograph is
information. But what you look like is data.
Captured Data and Knowledge
If I lose or destroy the photo, this doesnt change
how you look. \Our personal map/model of the World
Examples: What is (not) Data Mining?

What is not Data Mining? What is Data Mining?


Look up phone number in phone Certain names are more prevalent in certain
directory US locations (OBrien, ORurke, OReilly in
Query Boston area)
a Web search engine for
information about Amazon Group together similar documents returned
by search engine according to their context
A physician seeking a medical register (e.g. Amazon rainforest, Amazon.com,)
for analyzing the record of a patient with
a certain disease. Medical researchers finding a way of
grouping patients with the same disease,
The analysis of figures in a financial based on a certain number of specific
report of a trade symptoms
company. Using the trade company database
concerning sales, to identify the customers
main profiles
Why Mine Data

We are drowning in data,but starving for


knowledge!
Solution: Data warehousing and data mining
Why Mine Data? Commercial
Viewpoint

Lots of data is being collected


and warehoused
Web data, e-commerce
purchases at department/
grocery stores
Bank/Credit Card
transactions
Competitive Pressure is Strong
Provide better, customized services for an edge (e.g. in
Customer Relationship Management)
Why Mine Data? Scientific Viewpoint

Data collected and stored at


enormous speeds (GB/hour)
remote sensors on a satellite
telescopes scanning the skies
microarrays generating gene
expression data
scientific simulations
generating terabytes of data
Traditional techniques infeasible for raw data
Data mining may help scientists
in classifying and segmenting data
in Hypothesis Formation
The Data-Mine

Types of data
Data quality
How to Mine the Data?
Problems Solvable with Data Mining
Data Mining Applications
Types of data

Records:
Data Matrix;
Document Data;
Transaction Data.
Graphs

Molecular Structures
Ordered

-(genome sequences DNA, and meteorology)


Graphs((benzene formula and
directed graph))
Ordered datasets:

Example of ordered datasets (genome


sequences DNA, and meteorology)
Data quality

If they properly reflect the real context from


where they originate

The quality of data is strongly connected to the


process of Collecting them from the environment
(first/original record);
Measuring objects to obtain values for their
attributes Transcribing from the original source
(possible second record);
primary data is where you get o fact off a computer
or from a book secondary data is where you get a
fact of someone and it might not be true
Data Mining Process Pattern Evaluation

Data mining: the core of


knowledge discovery Data Mining
process.
Task-relevant Data

Data Selection
Data Preprocessing
Data Warehouse

Data Cleaning
Data Integration

Databases
Exploratory Data Analysis

What Is Exploratory Data Analysis?


Data Mining Tasks
Common data mining tasks
OLAP
Basically, exploratory data analysis (EDA) is
the Statistics part which deals with reviewing,
communicating and using data in case of a
low level of information on them
What Is Exploratory Data
Analysis?
Data Mining Tasks

Prediction Tasks
Use some variables to predict unknown or future values of other
variables
Description Tasks
Find human-interpretable patterns that describe the data.
Common data mining tasks
Classification [Predictive]
Clustering [Descriptive]
Association Rule Discovery [Descriptive]
Sequential Pattern Discovery [Descriptive]
Regression [Predictive]
Deviation Detection [Predictive]
Data Mining Models and Tasks
Association Rule Discovery: Definition
Given a set of records each of which contain some number of items from a given collection;
Produce dependency rules which will predict occurrence of an item based on occurrences of other items.

TID Items
1 Bread, Coke, Milk Rules
RulesDiscovered:
Discovered:
2 Beer, Bread {Milk}
{Milk}-->
-->{Coke}
{Coke}
3 Beer, Coke, Diaper, Milk {Diaper,
{Diaper,Milk}
Milk}-->
-->{Beer}
{Beer}
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
The Sad Truth About Diapers and Beer

So, dont be surprised if you find six-packs stacked next to diapers!


Regression

Predict a value of a given continuous valued variable


based on the values of other variables, assuming a linear
or nonlinear model of dependency.
Greatly studied in statistics, neural network fields.
Examples:
Predicting sales amounts of new product based on advetising
expenditure.
Predicting wind velocities as a function of temperature, humidity,
air pressure, etc.
Time series prediction of stock market indices.
OLAP

The OLAP (On-Line Analytical Processing)


technique, it is a technology that is used to
organize large business databases and
support business intelligence
Let us notice that among companies that sell
OLAP products we meet names
like: Microsoft, Oracle, SAP, IBM, SAS, etc.
BI DWH/OLAP Architecture
Classification and Decision Trees

What Is a Decision Tree?


Tree where the root and each internal node is
labeled with a question.
The arcs represent each possible answer to the
associated question.
Each leaf node represents a prediction of a
solution to the problem.
Decision Tree (DT):
Data Mining Techniques and
Models
well-known Data mining Method

Neural networks
How the Human Brain learns

In the human brain, a typical neuron collects signals from others


through a host of fine structures called dendrites.
The neuron sends out spikes of electrical activity through a long, thin
stand known as an axon, which splits into thousands of branches.
At the end of each branch, a structure called a synapse converts the
activity from the axon into electrical effects that inhibit or excite
activity in the connected neurons.
Artificial Neural Networks

An artificial neuron is a device with many inputs and


one output.
The neuron has two modes of operation;
the training mode and the using mode.
CONCLUSION

To find the unseen pattern in large volume of


historical data that helps to mange an
organization efficiently
Data mining: the core of knowledge discovery process.

We are drowning in data,but starving for


knowledge!
Solution: Data warehousing and data mining
THANK YOU
Questions or Comment

Das könnte Ihnen auch gefallen