Sie sind auf Seite 1von 22

Data Mining:

Concepts and
Techniques

May 5, 2015

Data Mining: Concept

Appendix A: An Introduction to
Microsofts OLE OLDB for Data
Mining

Introduction

Overview and design philosophy

Basic components

Data set components

Data mining models

Operations on data model

Concluding remarks

May 5, 2015

Data Mining: Concept

Why OLE DB for Data


Mining?

Industry standard is critical for data mining


development, usage, interoperability, and exchange

OLEDB for DM is a natural evolution from OLEDB and


OLDB for OLAP

Building mining applications over relational


databases is nontrivial

Need different customized data mining algorithms


and methods

Significant work on the part of application builders

Goal: ease the burden of developing mining


applications in large relational databases

May 5, 2015

Data Mining: Concept

Motivation of OLE DB for DM

Facilitate deployment of data mining models

Generating data mining models

Store, maintain and refresh models as data is


updated

Programmatically use the model on other data


set

Browse models

Enable enterprise application developers to


participate in building data mining solutions

May 5, 2015

Data Mining: Concept

Features of OLE DB for DM

Independent of provider or software

Not specialized to any specific mining


model

Structured to cater to all well-known


mining models

Part of upcoming release of Microsoft SQL


Server 2000

May 5, 2015

Data Mining: Concept

Overview

Core relational engine


exposes OLE DB in a
language-based API

Analysis server exposes


OLE DB OLAP and OLE DB
DM

Maintain SQL metaphor


Reuse existing notions

May 5, 2015

Data Mining: Concept

Data mining
applications
OLE DB OLAP/DM
Analysis Server
OLE DB
RDB engine
6

Key Operations to Support Data


Mining Models

Define a mining model


Attributes to be predicted
Attributes to be used for prediction
Algorithm used to build the model
Populate a mining model from training
data
Predict attributes for new data
Browse a mining model fro reporting and
visualization

May 5, 2015

Data Mining: Concept

DMM As Analogous to A Table in


SQL
Create a data mining module object
CREATE MINING MODEL [model_name]
Insert training data into the model and train it
INSERT INTO [model_name]
Use the data mining model
SELECT relation_name.[id], [model_name].[predict_attr]
consult DMM content in order to make predictions and
browse statistics obtained by the model
Using DELETE to empty/reset
Predictions on datasets: prediction join between a model
and a data set (tables)
Deploy DMM by just writing SQL queries!

May 5, 2015

Data Mining: Concept

Two Basic Components

Cases/caseset: input data

A table or nested tables (for hierarchical data)

Data mining model (DMM): a special type of table

A caseset is associated with a DMM and meta-info


while creating a DMM

Save mining algorithm and resulting abstraction


instead of data itself

Fundamental operations: CREATE, INSERT INTO,


PREDICTION JOIN, SELECT, DELETE FROM, and
DROP

May 5, 2015

Data Mining: Concept

Flatterned Representation of
Caseset
Customer
s
Customer
ID

Product
Purchases
Customer ID

Problem: Lots of replication!

Gender

Product
Name

Hair Color

Quantity

Age

Product Type

Age Prob
Car
Owernershi
p

CID

Gend

Hair

Age Age prob Prod

Quan

Type

Car

Car
prob

Male

Black 35

100%

TV

Elec

Car

100%

Male

Black 35

100%

VCR

Elec

Car

100%

Customer ID

Male

Black 35

100%

Ham

Food

Car

100%

Car

Male

Black 35

100%

TV

Elec

Van

50%

Car Prob

Male

Black 35

100%

VCR

Elec

Van

50%

Male

Black 35

100%

Ham

Food

Van

50%

May 5, 2015

Data Mining: Concept

10

Logical Nested Table


Representation of Caseset

Use Data Shaping Service to generate a


hierarchical rowset

Part of Microsoft Data Access


Components (MDAC) products
Product Purchases
CID

Gend

Hair

Age Age prob


Prod

May 5, 2015

Male

Black 35

100%

Quan

Type

TV

Elec

VCR

Elec

Ham

Food

Data Mining: Concept

Car
Ownership
Car

Car
prob

Car

100%

Van

50%

11

More About Nested Table

Not necessary for the storage subsystem


to support nested records

Cases are only instantiated as nested


rowsets prior to training/predicting data
mining models

Same physical data may be used to


generate different casesets

May 5, 2015

Data Mining: Concept

12

Defining A Data Mining Model

The name of the model

The algorithm and parameters

The columns of caseset and the


relationships among columns

Source columns and prediction


columns

May 5, 2015

Data Mining: Concept

13

Example
CREATE MINING MODEL [Age Prediction]
%Name of Model
(
[Customer ID]
LONG KEY,
%source column
[Gender]
TEXT
DISCRETE,
%source column
[Age]
Double DISCRETIZED() PREDICT, %prediction column
[Product Purchases]
TABLE
%source column
(
[Product Name] TEXT
KEY,
%source column
[Quantity]
DOUBLE NORMAL CONTINUOUS, %source column
[Product Type] TEXT
DISCRETE RELATED TO [Product Name]
%source column
))
USING [Decision_Trees_101]
%Mining algorithm used

May 5, 2015

Data Mining: Concept

14

Column Specifiers

KEY
ATTRIBUTE
RELATION (RELATED TO clause)
QUALIFIER (OF clause)
PROBABILITY: [0, 1]
VARIANCE
SUPPORT
PROBABILITY-VARIANCE
ORDER
TABLE

May 5, 2015

Data Mining: Concept

15

Attribute Types

DISCRETE

ORDERED

CYCLICAL

CONTINOUS

DISCRETIZED

SEQUENCE_TIME

May 5, 2015

Data Mining: Concept

16

Populating A DMM

Use INSERT INTO statement

Consuming a case using the data mining


model

Use SHAPE statement to create the


nested table from the input data

May 5, 2015

Data Mining: Concept

17

Example: Populating a DMM


INSERT INTO [Age Prediction]
(
[Customer ID], [Gender], [Age],
[Product Purchases](SKIP, [Product Name], [Quantity], [Product Type])
)
SHAPE
{SELECT [Customer ID], [Gender], [Age] FROM Customers ORDER BY [Customer ID]}
APPEND
{SELECT [CustID], {product Name], [Quantity], [Product Type] FROM Sales
ORDER BY [CustID]}
RELATE [Customer ID] TO [CustID]
)
AS [Product Purchases]

May 5, 2015

Data Mining: Concept

18

Using Data Model to Predict

Prediction join
Prediction on dataset D using DMM M
Different to equi-join
DMM: a truth table
SELECT statement associated with
PREDICTION JOIN specifies values
extracted from DMM

May 5, 2015

Data Mining: Concept

19

Example: Using a DMM in


Prediction
SELECT t.[Customer ID], [Age Prediction].[Age]
FROM [Age Prediction]
PRECTION JOIN
(SHAPE
{SELECT [Customer ID], [Gender] FROM Customers ORDER BY [Customer ID]}
APPEND
(
{SELECT [CustID], [Product Name], [Quantity] FROM Sales ORDER BY [CustID]}
RELATE [Customer ID] TO [CustID]
)
AS [Product Purchases]
)
AS t
ON [Age Prediction].[Gender]=t.[Gender] AND
[Age Prediction].[Product Purchases].[Product Name]=t.[Product Purchases].[Product Name] AND
[Age Prediction].[Product Purchases].[Quantity]=t.[Product Purchases].[Quantity]

May 5, 2015

Data Mining: Concept

20

Browsing DMM

What is in a DMM?

Rules, formulas, trees, , etc

Browsing DMM

Visualization

May 5, 2015

Data Mining: Concept

21

Concluding Remarks

OLE DB for DM integrates data mining and


database systems
A good standard for mining application
builders
How can we be involved?
Provide association/sequential pattern mining
modules for OLE DB for DM?
Design more concrete language primitives?
References
http://www.microsoft.com/data.oledb/d
m.html

May 5, 2015

Data Mining: Concept

22

Das könnte Ihnen auch gefallen