Sie sind auf Seite 1von 15

COIS13013 - BUSINESS INTELLIGENCE

Data Mining

Student Name:

DEFINITION OF DATA MINING


The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases. - Fayyad et al., (1996)

Data mining is a process that uses statistical, mathematical, artificial intelligence, and machine learning techniques to extract and identify useful information and subsequent knowledge from large database.

DATA MINING PROCESS

Most common standard processes:


CRISP-DM

(Cross-Industry Standard Process for Data Mining) SEMMA (Sample, Explore, Modify, Model, and Assess) KDD (Knowledge Discovery in Databases)

DATA MINING PROCESS: CRISP-DM


1
Business Understanding

2
Data Understanding

3
Data Preparation
Data Sources

6
Deployment Model Building

5
Testing and Evaluation

DATA MINING PROCESS: CRISP-DM


CRISP-DM provides a systematic and orderly way to conduct data mining projects. This process has six steps. First, an understanding of the data and an understanding of the business issues to be addressed are developed concurrently. Second, data are prepared for modeling; After data are modeled; Next, model results are evaluated; finally the models can be employed for regular use.

DATA MINING PROCESS: SEMMA


Sample
(Generate a representative sample of the data)

Assess
(Evaluate the accuracy and usefulness of the models)

Explore
(Visualization and basic description of the data)

SEMMA

Model
(Use variety of statistical and machine learning models )

Modify
(Select variables, transform variable representations)

DIFFERENCE BETWEEN CRISP-DM AND SEMMA

The main difference between CRISP-DM and SEMMA is that CRISP-DM takes a more comprehensive approachincluding understanding of the business and the relevant datato data mining projects, whereas SEMMA implicitly assumes that the data mining projects goals and objectives along with the appropriate data sources have been identified and understood.

DATA MINING METHODS


A large range of Data Mining methods are available now-a-days to handle the huge volume of data in any domain. Classification Clustering Association Sequence Discovery

DATA MINING METHODS-CLASSIFICATION

Classification learns patterns from past data (a set of informationtraits, variables, featureson characteristics of the previously labeled items, objects, or events) in order to place new instances (with unknown labels) into their respective groups or classes. The objective of classification is to analyze the historical data stored in a database and automatically generate a model that can predict future behavior.

DATA MINING METHODS-CLUSTERING


Cluster analysis is an exploratory data analysis tool for solving classification problems. The objective is to sort cases (e.g., people, things, events) into groups, or clusters, so that the degree of association is strong among members of the same cluster and weak among members of different clusters.

DATA MINING METHODS-ASSOCIATIONS

Association rule mining is a popular data mining method that is commonly used as an example to explain what data mining is and what it can do to a technologically less savvy audience. Association rule mining aims to find interesting relationships (affinities) between variables (items) in large databases. For example, a recession is associated with decline in house prices.

DATA MINING METHODS-SEQUENCE DISCOVERY


Sequence discovery is the identification of association over time. When appropriate information is available (e.g., the identity of a customer is a retail shop), a temporal analysis can be conducted to identify behaviour over time. This provides a considerable amount of information that could be used to increase sales or to detect fraud.

DATA MINING SOFTWARE TOOLS

Commercial SPSS - PASW (formerly Clementine) SAS - Enterprise Miner IBM - Intelligent Miner StatSoft Statistical Data Miner Free and/or Open Source Weka RapidMiner

DATA MINING MYTHS AND BLUNDERS

Data mining is considered to be a powerful analytical tool helping decision-makers understand the past and predict the future. However there are common myths and mistakes associated with this field.

provides instant solutions/predictions is not yet viable for business applications requires a separate, dedicated database can only be done by those with advanced degrees is only for large firms that have lots of customer data

CONCLUSION
Data Mining refers to develop business intelligence from data that an organization collects, organizes, and processes. Data mining techniques are being used by organizations to gain a better understanding of their customers and their own operations.

Das könnte Ihnen auch gefallen