Sie sind auf Seite 1von 38

Administrivia HW #2

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Online Analytical Processing (OLAP)


BI Tools and Techniques
Robert Monroe
April 8, 2008

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Key Take Aways

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Core OLAP Concepts

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

What Are OLAP Tools?


OLAP tools provide a mechanism for interactive analysis
and exploration of dimensional data
Interactive: users need to be able to easily specify queries
Analysis: it should be possible to perform (and reuse) complex
analyses of the dimensional data
Exploration: answering one question with an OLAP tool
frequently raises numerous subsequent questions
A good OLAP tool allows the user to quickly pose follow-on queries

Dimensional: OLAP tools operate on dimensional data data


structured as facts and dimensions
Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

OLAPs Role In Decision Making

OLAP Sweet-Spot

Source: OBrien, Management Information Systems, 6th ed.

OLAP excels at exploring complex, structured questions


Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Quick OLAP Tools Demo


Contour Components OLAP cube browser
Open http://olaplib.contourcomponents.com/ in IE 6.0 or higher
Ok the installation of any ActiveX controls that the site requests
Use the Samples > Government > Regional Employee Turnover menu in
the upper left of the screen to open up sample OLAP cube.

Demo requires IE 6.0 or later and ActiveX install


Installation for class is optional

For first demo we will browse regional emloyee turnover data

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Why Not Just Write SQL Queries?

Performance
Complexity
Exploration
Presentation
Difficulty in dealing with hierarchies
Difficult or impossible to specify some desired queries

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Why Not Just Use Spreadsheets?


Complexity (with > 2 dimensions)
Presentation is tied to representation
Does not scale to large data sets or many dimensions
Storage and representation is ill-suited to the task

Inability to deal with hierarchies

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

OLAPs Place In A Business Intelligence Solution

Reconcile Data
Derive
Data

OLAP
Tools

An

ze
y
l
a

OLAP
Cube

Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Dimensional Modeling with HyperCubes:


Basic Concepts

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Representing Dimensional Databases as Cubes


OLAP tools represent dimensional data as cubes
Cubes are also sometimes referred to as hypercubes

Dimension tables are represented as cube dimensions


Facts are represented using measures
Measures can be thought of as the values stored in individual
cells of the cube
Measures consist of two parts:
A numerical value that represents the basic fact
A formula for combining multiple measures into a single measure

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Quick Review: Dimensional Modeling Example


Fact table provides statistics for sales
broken down by product, period and store
dimensions

Dimension tables provides details on


stores, products, and time periods
th
Diagram
Source:
Hoffer,
Prescott,
McFadden,
Modern
Database
Management,
ed.
Diagram
Source:
Hoffer,
Prescott,
McFadden,
Modern
Database
Management,
7th7ed.

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Quick Review: Dimensional Example With Data


Product (dimension)

Period (dimension)

Store (dimension)
Sales
(fact)

th
Diagram
Source:
Hoffer,
Prescott,
McFadden,
Modern
Database
Management,
ed.
Diagram
Source:
Hoffer,
Prescott,
McFadden,
Modern
Database
Management,
7th7ed.

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Multiple Fact Tables


It is frequently useful to
store more than one
type of fact in a single
multidimensional
database (star schema)
This can be handled by
using multiple fact
tables that share
dimensions
Example: modeling
products sold and
products purchased
th
Diagram
Source:
Hoffer,
Prescott,
McFadden,
Modern
Database
Management,
ed.
Diagram
Source:
Hoffer,
Prescott,
McFadden,
Modern
Database
Management,
7th7ed.

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Factless Fact Tables Tracking Events


Factless fact tables store only foreign keys, no facts
Factless fact tables allow the tracking of what types of events
happened, and under what circumstances they happened

th
Diagram
Source:
Hoffer,
Prescott,
McFadden,
Modern
Database
Management,
ed.
Diagram
Source:
Hoffer,
Prescott,
McFadden,
Modern
Database
Management,
7th7ed.

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Conformed Dimensions
When dimensions are shared across multiple fact tables
they must be conformed dimensions
Conformed dimensions
One or more dimension tables associated with two or more
fact tables for which the dimension tables have the same
business meaning and primary key with each fact table

Conformed dimensions allow users to:


Query across multiple fact tables
Improve consistency of meaning and structure for derived and
retrieved information
Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Tabular Representation of Measures and Dimensions


Simple example of viewing OLAP data in a grid:
Row headings (Store) represent dimension members
Columns represent different measures

Dimension
Store Sales Data for 2004
Store

Gross Sales

Quota

Profits

Sales vs. Quota

Chicago

$3,250,000

$2,750,000

$624,352

+ $500,000

New York

$4,500,000

$3,550,000

$100,000

+ $950,000

Pittsburgh

$1,600,000

$1,700,000

$250,000

- $100,000

Measures
Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Tabular Representation of Measures and Dimensions


Example 2: Store sales by year and store location
Column and row headings represent dimension values in this case
Cells represent measures, Name of table describes measure

Dimensions
Store Sales Data 2004-2007
Store

2004

2005

2006

2007

Chicago

$3,250,000

$3,500,000

$3,000,000

$3,900,000

New York

$4,500,000

$4,350,000

$5,100,000

$5,450,000

Pittsburgh

$1,600,000

$1,700,000

$1,800,000

$1,650,000

Measures
Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Cube Representation of Measures and Dimensions

th
Diagram
Source:
Hoffer,
Prescott,
McFadden,
Modern
Database
Management,
ed.
Diagram
Source:
Hoffer,
Prescott,
McFadden,
Modern
Database
Management,
7th7ed.

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Dimension Hierarchies
Dimension tables are represented as cube dimensions
Cube dimensions use levels to represent hierarchies
Each sub-level subdivides the parent level with finer granularity

Dimensions can be of fixed or variable height (jagged)


Examples
Dimension: Time Period
Levels Year :: Quarter :: Month :: Week :: Day

Dimension: Organization
Levels Company :: Division :: Department :: Employee

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Measures
Measures represent the interesting data at the
intersection of different dimensions
There is a space for a measure at every intersection of
every level of every dimension
Base facts are stored in the intersections of lowest-level
dimensions (either simple or calculated measures)
Aggregate or computed values are stored at the intersections
of where all of the dimensions are not at the lowest level
(aggregate values must be calculated measures)
Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Three Categories Of Measures


Additive measures can be meaningfully combined along any
dimensions
Example: total sales by product, location, or time

Semi-additive measures cannot be combined along one or more


dimensions
Example: summing inventory levels across time

Non-additive measures cannot be combined along any dimensions


Example: weighted averages without weight information

Exercise:
Identify three measures of interest for a cube that tracks sales data
Be sure to identify numeric value tracked and aggregation function
Definition source: Pedersen and Jensen, Multidimensional Database Technology, IEEE Computer 12/01
Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Why OLAP Performs So Well


Pre-computation of aggregates, and other values at
cube-building time enable very rapid responses to many
common queries
Ability to specify other formulas/values to precompute
on cube build
Use of standardized structure and dimensional model
allows query engine to make many assumptions about
how to best answer queries and take advantage of precomputed values

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Dimensions Examples
What dimensions are available in the regional employee
turnover example?
Are there any important dimensions missing that you might want
to use for an analysis if you were a governmental official trying
to improve the employment outlook in your region?

The worldwide population cube has an example of a


hierarchical dimension
Which one is hierarchical?
Is it a fixed or jagged dimension?
What are the measures in this cube?

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Analytics
Analytics are specific analyses that can be performed
on an OLAP cube
Simple pre-defined analytics (sums, counts, percentages)
Complex pre-canned analytics defined as part of the cube
model/build
Ad-hoc exploration

Examples:
Actual sales vs. quota by sales region
Supplier count by commodity category by division
Deviation from contracted pricing by supplier, commodity
category, and division over the previous 3 years
Examples of analytics related to sourcing or procurement?
Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Analytics Examples
Revenue cube analytics
Automobile traffic analytics
Marketing dynamics cube (multiple slices preset)

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Drilling Down

The drilling down


operation analyzes
the data presently
displayed in greater
detail.
Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Slicing
The slicing operation selects specific values for one or more
dimensions of a cube and renders measures for those dimensions
in a two-dimensional table

Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Filtering
Filtering reduces the elements included in a calculation
Filtering can cross multiple slices
Example: filter previous results to only show February, April, May

th
Diagram
Source:
Hoffer,
Prescott,
McFadden,
Modern
Database
Management,
ed.
Diagram
Source:
Hoffer,
Prescott,
McFadden,
Modern
Database
Management,
7th7ed.

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

In-Class Exercise
Open the Contour Cubes Automobile Traffic sample
Which intersection and day in London has the most
overutilization of the roads?
Which intersection has the worst overutilization of
roads across all of the days?
Which intersection has the highest overall hourly traffic
flow?

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Pivoting Data
OLAP tools generally let
you pivot dimensions
This involves switching
which dimensions are
displayed horizontally and
which are displayed
vertically

This can be useful when


exploring and trying to
visualize data

Carnegie Mellon University

Store Sales Data 97 00 ($ Millions)


Store

1997

1998

1999

2000

Chicago

$3.25

$3.5

$3.0

$3.9

NY

$4.5

$4.35

$5.1

$5.45

Pgh

$1.6

$1.7

$1.8

$1.65

Pivot
Annual Sales, By Store 97 00 ($ Millions)
Year

Chicago

NY

PGH

1997

$3.25

$4.5

$1.6

1998

$3.5

$4.35

$1.7

1999

$3.0

$5.1

$1.8

2000

$3.9

$5.45

$1.65

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Modeling Hierarchies
Dimension tables frequently model hierarchies
Example:
Customers dimension stores data about your customers
You may sell to several divisions of a single company
You want to be able to analyze sales to the individual divisions and also
capture rolled-up values for the parent company

Divisions of ABC Automotive


Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Modeling Hierarchies With Denormalized Tables (I)


Hierarchical dimensions are frequently represented with
denormalized tables
Simplifies and speeds queries at the cost of introducing anomalies
This example represents a jagged or arbitrary hierarchy
Customer_Dimension
Parent_Company

Customer_Key

Name

Address

Type

<null>

C000001

ABC Automotive

100 1st St.

Dealer

C000001

C000002

ABC Auto Sales

110 1st St.

Sales

C000001

C000003

ABC Repair

130 1st St.

Service

C000002

C000004

ABC Auto New Sales

110 1st St.

Sales

C000002

C000005

ABC Auto Used Sales

110 1st St.

Sales

<null>

C000006

Bubbas House O Cars

5432 Maple Ln

Dealer

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Modeling Hierarchies With Denormalized Tables (II)


Similar example but with a well-defined hierarchy depth
Same number of levels for all entries in the dimension table
Simpler structureThis approach requires a fixed height to hierarchy
, CityID serves as primary key for the whole table
City_Geography_Dimension
CityID

CityName

StateID

StateName

TimeZone

45

Little Rock

Arkansas

Central

263

Denver

15

Colorado

Mountain

423

Aspen

15

Colorado

Mountain

522

Pittsburgh

36

Pennsylvania

Eastern

771

Philadelphia

36

Pennsylvania

Eastern

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Wrap Up

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Key Take Aways

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

7th Inning Stretch

Carnegie Mellon University

2006 - 2008 Robert T. Monroe

2007
Robert
45-875
BI Tools
and T. Monroe

Das könnte Ihnen auch gefallen