Sie sind auf Seite 1von 65

Building Business Intelligence and

Data Mining Applications with


Microsoft SQL Server 2005

Introductions
Presenter

Javier Loria
Solid Quality Learning
javier@solidqualitylearning.com

Agenda
Overview

& BI Challenges
Introducing the UDM
The UDM in Detail
Data Mining Overview

Agenda
Overview

& BI Challenges
Introducing the UDM
The UDM in Detail
Data Mining Overview

Business Intelligence Platform

Integrate
z

Data acquisition
from source
systems and
integration
Data transformation
and synthesis

Analyze
z

Data enrichment,
with business
logic, hierarchical
views
Data discovery via
data mining

Report
z
z

Data presentation
and distribution
Data access for
the masses

Overview

Getting information from enterprise data


Using BI across the enterprise as an
integral part of doing business
Capture and model all of your data
Integration with business processes
Relational reporting and OLAP converged
through a single dimensional model

Business Intelligence Challenges


Multiple

Data Models
Multiple Data Sources
Multiple APIs
Duplication of Data

Atlanta
Chicago
Denver
Grapes
Cherries
Melons
Apples

Dallas
Q4
Q1
Q2
Q3
Time Dimension

Di Pro
m du
en c
si t
on

Markets Dimension

What Is a Cube?

What Is a Cube?

Enterprise BI Today
Data Sources
MOLAP

Data Models

Tools

OLAP
Browser

MOLAP

Datamart

Reporting
Tool (1)

Datamart
Reporting
Tool (2)

DW

Reporting
Tool (3)

Relational vs. OLAP Reports


Relational

OLAP

Flexible schema

Real time data access

Single data store

Simple management

Detail reporting

High performance

End-user oriented

Ease of navigation and


exploration

Rich analytics

Rich semantics

Feature

Agenda
Overview & BI Challenges
Introducing

the UDM
The UDM in Detail
Data Mining Overview

The Unified Dimensional Model


The Best of Relational and OLAP
Relational Reporting

OLAP Cubes

Multiple

Multidimensional

fact tables
Full richness the
dimensions attributes
Transaction level access
Star, snowflake, 3NF
Complex relationships
Recursive self joins
Slowly changing
dimensions

navigation
Hierarchical presentation
Friendly entity names
Powerful MDX calculations
Central KPI framework
Multiple perspectives
Partitions
Aggregations
Distributed sources

UDMs Role
Allows

the User Model to be Enriched


Provides High Performance Queries
Allows the Capture of Business Rules to
Support Analysis
Supports Closing the Loop Where the
User Acts Upon the Data

Enterprise BI with UDM


MOLAP

OLAP
Browser

MOLAP
Reporting
Tool

Datamart
Datamart

DW

UDM
BI Applications

Scalable, High Performance


UDM Server
Analysis
Services
MOLAP

Datamart
Datamart

DW

UDM

XML/A or OLE DB/OLAP

MOLAP

OLAP
Browser

Reporting
Tool

BI Applications

Analysis Server as UDM Server


Optimized

SQL to all major RDBMS

platforms
XML/A client API

SOAP-based Web service


API supported by all major BI vendors

Managed

and native providers

ADOMD.NET
OLE DB for OLAP

Streamlined BI Infrastructure
Unified

logical model for both relational and


OLAP with superb performance and
scalability
One data store to manage ensure data
consistency and low TCO
Rich user experience with many Microsoft
and 3rd-party tools

BI Development Studio
Complete,

integrated tool for the


development of BI applications
Enterprise software development
environment
Integrated with Visual Studio
Team development, source control,
versioning, developer isolation, resource
independent coding

Performance
Proactive

Automatic MOLAP cache creation and


management

MOLAP

caching

becomes transparent

No requirement to manage an OLAP store

Relational

reporting enjoys MOLAP-like


performance

MOLAP, ROLAP, and HOLAP

MOLAP Caching
Data Source

Tool

MOLAP

MOLAP

Datamart

UDM

Datamart

DW

Cache
Notifications

XML/A or ODBO

Analysis
Services
OLAP
Browser
Reporting
Tool
BI Applications

Agenda
Overview & BI Challenges
Introducing the UDM
The

UDM in Detail
Data Mining Overview

UDM and The BI Studio

UDM Data Sources


Multiple

OLTP
OLAP
XML

Data Sources

Data Source Views


Tables
Views
Stored

Queries

Dimensions and Hierarchies


Dimensions

Attribute-Based

Consolidates all attributes of an entity

Hierarchies Organize Data


Custom hierarchies can be

from attributes

created

Cubes
No

More Limits

Limited only by addressable objects


(2147483647)

Stored

as XML
Logical Grouping of Measures and
Dimensions

Perspectives
UDM

Provides Subject Area Centric


View of the Data Warehouse
Perspectives Feature Allows
User/Group Specific View of the Same
Data

Categorization
Semantically

Measures
Dimensions
Attributes
Hierarchies

Meaningful Categories

Time
UDM

Has Built-In Knowledge of Time

Natural (Calendar)
Fiscal
Reporting
Manufacturing
ISO 8601

Translations
UDM

provides for multiple languages


Metadata in BI Studio and Client Tool
Displayed in Multiple Languages

Attribute Semantics
Names

Vs. Keys
Ordering
Descretization

Key Performance Indicators


Actual

Value
Goal Value
Status
Trend
Graphical Representation

Closing the Loop


Integrated

Data Mining

Writeback

The UDM is not read-only

Actions

ProClarity Business Intelligence Analytics


Live Client
(Excel based)

Live Server

OLAP
Cube

Web Client Bundle


(includes
Dashboard
Viewer)

Dashboard Server

OLAP
Cube
OLAP
Cube
OLAP
Cube
OLAP
Cube

Business Logic Server

Analytics Server

Selector
and
KPI Designer
(All Professional Clients)
Web Standard
(zero footprint)

Web Professional
(Includes
Business Reporter
for Excel)
Desktop Professional
(Includes
Business Reporter
for Excel)

ProClarity Key Differentiators


Speed in decisions, real insight
One version of the truth
Analysis Platform
ProClarity + Microsoft; total BI platform
Super end-user friendly environment
All users own information
Several visualizations for quick

understanding
Platform total customizable

Low Total Cost of Ownership & Flexible to implement

Agenda
Overview & BI Challenges
Introducing the UDM
The UDM in Detail
Data

Mining Overview

Data Mining Architecture


LOB
LOB
Application
Application
Model
Model
Browsing
Browsing

Historical
Historical
Dataset
Dataset
SQL
SQL
OLE/DB
OLE/DB
Text
Text File
File

Web
Web
..NET
NET
Native
Native

Reporting
Reporting

Data Transform (SSIS)

Prediction

Mining Models
Cube
Cube
Cube
Cube

New
New
Dataset
Dataset
Operations
(SSIS)

CRoss Industry Standard Process


for Data Mining (CRISP)

http://www.crisp-dm.org

Microsoft Mining Model Algorithms

Decision Trees

Clustering

Sequence
Clustering

Association

Time Series

Introduced in SQL Server 2000

Nave Bayes

Neural Net

Microsoft Mining Models

When To Use What


Analytical Problem

Examples

Algorithms

Classification: Assign cases to


predefined classes

Credit risk analysis


Churn analysis
Customer retention

Decision Trees
Naive Bayes
Neural Nets

Segmentation: Taxonomy for


grouping similar cases

Customer profile analysis


Mailing campaign

Clustering
Sequence Clustering

Association: Advanced counting


for correlations

Market basket analysis


Advanced data exploration

Decision Trees
Association

Time Series Forecasting: Predict


the future

Forecast sales
Predict stock prices

Time Series

Prediction: Predict a value for a


new case based on values for
similar cases

Quote insurance rates


Predict customer income

All

Deviation analysis: Discover how


a case or segment differs from
others

Credit card fraud detection


Network infusion analysis

All

Thank You
Javier Lora
Business Intelligence,
Solid Quality Learning
javier@solidqualitylearning.com

Decision Trees

Classify each case to one of a few discrete


broad categories of selected attributes
The process of building is recursive
partitioning splitting data into partitions
and then splitting it up more
Initially all cases are in one big box

Decision Trees (cont.)

The algorithm tries all possible breaks in classes


using all possible values of each input attribute;
it then selects the split that partitions data to the
purest classes of the searched variable

Then it repeats splitting for each new class

Several measures of purity


Again testing all possible breaks

Unuseful branches of the tree can be


pre-pruned or post-pruned

Decision Trees (cont.)


Decision trees are used for classification and
prediction
Typical questions:

Predict which customers will leave


Help in mailing and promotion campaigns
Explain reasons for a decision
What are the movies young female customers likely to
buy?

Nave Bayes

Classification and Prediction Model


Calculates probabilities for each possible
state of the input attribute given each state
of the predictable attribute

Nave Bayes (cont.)


Used

for classification

Assign new cases to predefined classes

Some

typical questions:

Categorize bank loan applications


Determining which home telephone lines
are used for Internet access
Assigning customers to predefined
segments
Quickly gathering basic comprehension

Cluster Analysis

Grouping data into clusters

Objects within a cluster have high similarity


based on the attribute values

The class label of each object is not


known
Several techniques

Partitioning methods
Hierarchical methods
Density based methods
Model-based methods, more

Cluster Analysis (cont.)


Segments

a heterogeneous population
into a number of more homogenous
subgroups or clusters
Some typical questions:
Discover distinct groups of customers
Identify groups of houses in a city
In biology, derive animal and plant
taxonomies

Sequence Clustering
Analyzes

sequence-oriented data that


contains discrete-valued series

The sequence attribute in the series holds a


set of events with a specific order that can be
cosnsidered as a model

Typically

used for Web customer analysis

Can be used for any other sequential data

Sequence Clustering (cont.)


Click-Stream Analysis
User

Sequence

frontpage news travel travel

news news news news news

frontpage news frontpage news frontpage

news news

frontpage news news travel travel travel

news weather weather weather weather

news health health business business business

frontpage sports sports sports weather

weather

Microsoft Mining Models

Association Rules
For

market basket analyses

Identify cross-selling opportunities


Arrange attractive packages

Considers

each attribute/value pair as an

item
An item set is a combination of items in a
single transaction
The algorithm scans through the dataset
trying to find item sets that tend to appear
in many transactions

Association Rules Support

Support is the percentage of rows


containing the item combination compared
to the total number of rows:
Transaction 1:
Transaction 2:
Transaction 3:
Transaction 4:
Transaction 5:

Frozen pizza, cola, milk


Milk, potato chips
Cola, frozen pizza
Milk, pretzels
Cola, pretzels

The support for the rule If a customer


purchases Cola, then they will purchase
Frozen Pizza is 40%

Association Rules Confidence

What if 60% of customers buy milk and


only 20% of those buy potato chips?
The confidence of an association rule is
the support for the combination divided by
the support for the condition
This gives a confidence for a rule If a
customer purchases Milk, they will
purchase Potato Chips of (20% / 60%) =
33%

Time Series
Predict

continuous columns, such as


product sales or stock performance in a
forecasting scenario
Builds a model in two stages

First stage creates a list of optimal candidate


input columns
Second stage investigates each candidate
input column and determines if it improves the
model

Neural Network

Data modeling tool that is able to capture and


represent complex input/output relationships
Neural networks resemble the human brain in
the following two ways:

A neural network acquires knowledge through


learning
A neural network's knowledge is stored within interneuron connection strengths known as synaptic
weights

It explores all possible data relationships

It can be slow

Back-Propagation
Training

a neural network is setting the best


weights on the inputs of each of the units
The back-propagation process:

Get a training example and calculate outputs


Calculate the error the difference between
the calculated and the expected (known) result
Adjust the weights to minimize the error

Das könnte Ihnen auch gefallen