Sie sind auf Seite 1von 26

Data Warehousing

SS ZG515
BITS Pilani
Pilani Campus

PC Reddy
Guest Faculty WILP, BITS Pilani

BITS Pilani
Pilani Campus

Data Warehousing Lecture 4


Dimensional Modeling

Lecture 3 Outline

Review Lecture 3

Dimensional modeling

Retail grocery store case study.

BITS Pilani, Pilani Campus

The Dimensional Data Model


An alternative to the normalized data model
Present information as simple as possible (easier to
understand)
Return queries as quickly as possible (efficient for
queries)
Track the underlying business processes (process
focused)

BITS Pilani, Pilani Campus

The Dimensional Data Model


Contains the same information as the normalized
model
Has far fewer tables
Grouped in coherent business categories
Pre-joins hierarchies and lookup tables resulting in
fewer join paths and fewer intermediate tables
Normalized fact table with denormalized dimension
tables.

BITS Pilani, Pilani Campus

Fact Table
Measurements associated with a specific business
process
Grain: level of detail of the table
Process events produce fact records
Facts (attributes) are usually
Numeric
Additive

Derived facts included


Foreign (surrogate) keys refer to dimension tables
(entities)

BITS Pilani, Pilani Campus

Dimension Tables
Entities describing the objects of the process
Conformed dimensions - cross processes
Attributes are descriptive
Text
Numeric

Surrogate keys
1:m with the fact table
Null entries
Date dimensions

BITS Pilani, Pilani Campus

Bus Architecture
An architecture that permits aggregating data across
multiple marts
Conformed dimensions and attributes
Bus matrix

BITS Pilani, Pilani Campus

Keys and Surrogate Keys


A surrogate key is a unique identifier for data
warehouse records that replaces source
primary keys (business/natural keys)
Protect against changes in source systems
Allow integration from multiple sources
Enable rows that do not exist in source data
Track changes over time (e.g. new customer
instances when addresses change)
Replace text keys with integers for efficiency

BITS Pilani, Pilani Campus

Slowly Changing Dimensions


Attributes in a dimension that change more slowly
than the fact granularity
Type 1: Current only / overwrite the old value
Type 2: All history / create a new dimensional
record
Type 3: Most recent few (rare) / create a
previous value attribute
Note: rapidly changing dimensions usually indicate
the presence of a business process that should be
tracked as a separate dimension or as a fact table
BITS Pilani, Pilani Campus

Slowly Changing Dimensions


CustKey

BKCustID

CustName

CommDist

Gender

HomOwn?

1552

31421

Jane Rider

Fact Table
Date

CustKey

ProdKey

Item Count

Amount

1/7/2004

1552

95

1,798.00

3/2/2004

1552

37

27.95

5/7/2005

1552

87

320.26

2/21/2006

1552 2387

42

19.95

Dimension with a slowly changing attribute


Cust
Key

BKCust
ID

Cust
Name

Comm
Dist

Gender

Hom
Own?

Eff

End

1552

31421

Jane Rider

1/7/2004

1/1/2006

2387

31421

Jane Rider

31

1/2/2006

12/31/9999

BITS Pilani, Pilani Campus

Slowly Changing Dimensions


Original
Type 1

Type 2

Type 3

Hybrid

ProductKey

Description

Category

SKU

21553

LeapPad

Education

LP2105

ProductKey

Description

Category

SKU

21553

LeapPad

Toy

LP2105

ProductKey

Description

Category

SKU

21553

LeapPad

Education

LP2105

44631

LeapPad

Toy

LP2105

ProductKey

Description

Category

OldCat

SKU

21553

LeapPad

Toy

Education

LP2105

ProductKey

Description

Category

OldCat

SKU

21553

LeapPad

Education

Electronics

LP2105

44631

LeapPad

Toy

Education

LP2105

68122

LeapPad

Education

Electronics

LP2105

BITS Pilani, Pilani Campus

Date Dimensions
One row for every day for which you expect to
have data for the fact table (perhaps
generated in a spreadsheet and imported)
Usually use a meaningful integer surrogate
key (such as yyyymmdd 20060926 for Sep.
26, 2006). Note: this order sorts correctly.
Include rows for missing or future dates to be
added later.

BITS Pilani, Pilani Campus

More about dimensions


Views for dimensions used for different purposes
e.g. StartDate and EndDate

Junk dimensions for flags and miscellaneous categories


removed from the fact table
Degenerate dimensions have no attributes
Usually reserved for order number or something similar

BITS Pilani, Pilani Campus

Aggregates
Precalculated summary tables
Improve performance
Record data an coarser granularity

State change summary that has one row per item.

Access rows on each update.

BITS Pilani, Pilani Campus

Fact Tables
Transaction
Track processes at discrete points in time when they occur

Periodic snapshot
Cumulative performance over specific time intervals

Accumulating snapshot
Constantly updated over time. May include multiple dates representing
stages.

BITS Pilani, Pilani Campus

Case Study:
Retail Grocery Store
Process: Retail Sales
Grain: POS line item
Dimensions: Date, Store, Product, Promotion

Facts: Sales Quantity, Sales Dollar Amount, Cost Dollar


Amount, Gross Profit Dollar Amount.

BITS Pilani, Pilani Campus

Star schema Model

DATE
DateKey
Attributes

STORE
StoreKey
Attributes

POS FACT
DateKey
ProductKey
StoreKey
PromotionKey
POSTransactionNumber
SalesQuantity
SalesDollarAmount
CostDollarAmount
GrossProfitDollarAmount

PRODUCT
ProductKey
Attributes

PROMOTION
PromotionKey
Attributes

BITS Pilani, Pilani Campus

Possible Date Attributes


SQL date
Full date description
Day of week
Day of month
Day of calendar year
Day of fiscal year
Month of calendar year
Month of fiscal year
Calendar Quarter
Fiscal Quarter

Fiscal week
Year
Month
Fiscal year
Holiday ?
Holiday name
Day of holiday
Weekday ?
Selling season
Major event
etc.

BITS Pilani, Pilani Campus

Possible Product Attributes


Description
SKU number
Brand description
Department
Package type
Package size
Fat content
Diet type
Weight

Weight units of
measure
Storage type
Shelf unit type
Shelf width
Shelf height
Shelf depth
etc.

BITS Pilani, Pilani Campus

Possible Store Attributes


Store Name
Store Number
Street address
City
County
State
Zip
Manager
District

Region
Floor plan type
Photo processing type
Financial service type
Square footage
Selling square footage
First open date
Last remodel date
etc.
BITS Pilani, Pilani Campus

Factless Fact Tables

In order to evaluate promotions that might have


generated no sales we need another approach.
Promotion could generate another fact table (or could be
considered a fact table in itself). That new fact table
would have no additive attributes.

BITS Pilani, Pilani Campus

Conformed Dimensions:
Inventory Snapshot Model
Process: Store inventory
Grain: Daily inventory by product and store
Dimensions: Date, product, store

Fact: quantity-on-hand

BITS Pilani, Pilani Campus

Dimensional Model

DATE
DateKey
Attributes

Inventory Fact
ProductKey
DateKey
StoreKey
QuantityOnHand
QuantitySold
ValueAtCost
ValueAtSellingPrice

PRODUCT
ProductKey
Attributes

STORE
StoreKey
Attributes

Note: QuantityOnHand is semi-additive. It is additive across product and store,


but not across date. The other attributes are additive.
BITS Pilani, Pilani Campus

Conformed Dimensions
Common dimensions for different processes should be
the same.
Note: Dimensions for roll-up or aggregated fact tables
my add or eliminate attributes based on the aggregation
Where attributes apply, they should mean the same
thing.

BITS Pilani, Pilani Campus

The Bus Matrix


Date

Product

Store

Promotion

Warehouse

Vendor

Retail Sales

Retail Inventory

Retail
Deliveries

Warehouse
Inventory

Warehouse
Deliveries

Purchase Orders

Contract

Shipper

Process

BITS Pilani, Pilani Campus

Das könnte Ihnen auch gefallen