Sie sind auf Seite 1von 24

The “Classic” Star Schema

Store Dimension Fact Table Time Dimension


STORE KEY STORE KEY
PERIOD KEY
Store Description PRODUCTKEY
City PERIOD KEY Period Desc
State Year
Dollars Quarter
District ID
Units
District Desc. Month
Price
Region_ID Day
Region Desc.
Regional Mgr.
Product Dimension
PRODUCTKEY
Product Desc.
Brand
Color
Size
Manufacturer

Sudarshan
Copyright © 1995-1996 Archer Decision Sciences, Inc.
The “Classic” Star Schema
A single fact table, with
Store Dimension Fact Table
STORE KEY
Time Dimension
detail and summary data
STORE KEY PERIOD KEY
Store Description PRODUCTKEY
City PERIOD KEY Period Desc
Year
Fact table primary key has
State
District ID
District Desc.
Dollars
Units
Quarter
Month
only one key column per
Price
Region_ID
Region Desc.
Day
Current Flag
dimension
Product Dimension
Regional Mgr. Resolution
Each key is generated
Level PRODUCTKEY Sequence
Product Desc.
Brand

Each dimension is a single


Color
Size
Manufacturer
Level table, highly denormalized

Benefits: Easy to understand, easy to define hierarchies, reduces # of physical joins, low
maintenance, very simple metadata
Drawbacks: Summary data in the fact table yields poorer performance for summary
levels, huge dimension tables a problem

Sudarshan
Copyright © 1995-1996 Archer Decision Sciences, Inc.
Store Dimension The “Classic” Star
Fact Table
STORE KEY
Schema
The biggest drawback: dimension
Time Dimension
STORE KEY
Store Description PRODUCTKEY
PERIOD KEY tables must carry a “level”
Period Desc
City
State
PERIOD KEY
Year indicator for every record and
Dollars
District ID
District Desc.
Units
Quarter
Month
every query must use it. In the
Price
Region_ID
Region Desc.
Day
Current Flag
example below, without the level
Product Dimension
Regional Mgr.
Level PRODUCTKEY
Resolution constraint, keys for all stores in the
Sequence
Product Desc. NORTH region, including
Brand
Color aggregates for region and district
Size
Manufacturer will be pulled from the fact table,
Level
resulting in error.

Example: Select A.STORE_KEY, A.PERIOD_KEY, A.dollars Level is needed


from Fact_Table A whenever aggregates
where A.STORE_KEY in (select STORE_KEY are stored with detail
from Store_Dimension B facts.
where region = “North” and Level = 2)
and etc...
Sudarshan
The “Level” Problem
• Level is a problem
because because it causes
potential for error. If the
query builder, human or
program, forgets about it,
perfectly reasonable
looking WRONG answers
can occur.
• One alternative: the FACT
CONSTELLATION
model...

Sudarshan
The “Fact Constellation” Schema
Store Dimension Fact Table Time Dimension
STORE KEY STORE KEY
PERIOD KEY
Store Description PRODUCTKEY
City PERIOD KEY Period Desc
State Year
Dollars Quarter
District ID
Units
District Desc. Month
Price
Region_ID Day
Region Desc. Current Flag
Regional Mgr.
Product Dimension
Sequence
PRODUCTKEY
Product Desc.
Brand District Fact Table
Color
Region Fact Table
Size District_ID
Manufacturer PRODUCT_KEY Region_ID
PRODUCT_KEY
PERIOD_KEY
PERIOD_KEY
Dollars Dollars
Units Units
Price Price

Sudarshan
Copyright © 1995-1996 Archer Decision Sciences, Inc.
The “Fact Constellation” Schema
Store Dimension

STORE KEY
Fact Table
STORE KEY
Time Dimension
In the Fact Constellations,
PERIOD KEY
Store Description
City
PRODUCTKEY
PERIOD KEY Period Desc aggregate tables are created
State
District ID
Dollars
Units
Year
Quarter separately from the detail,
District Desc. Month
Region_ID
Region Desc.
Price
Day
Current Flag
therefor
Product Dimension
Regional Mgr.
PRODUCTKEY
Sequence it is impossible to pick up, for
Product Desc.
Brand
Color
Dis tric t Fact Table example, Store detail when
querying
Re g io n Fac t Table
Size District_ID
Manufacturer PRODUCT_KEY Region_ID

the District Fact Table.


PRODUCT_KEY
PERIOD_KEY
PERIOD_KEY
Dollars Dollars
Units Units
Price Price

Major Advantage: No need for the “Level” indicator in the dimension tables,
since no aggregated data is stored with lower-level detail

Disadvantage: Dimension tables are still very large in some cases, which can
slow performance; front-end must be able to detect existence of aggregate
facts, which requires more extensive metadata
Sudarshan
Copyright © 1995-1996 Archer Decision Sciences, Inc.
Another Alternative to “Level”

Fact Constellation is a good alternative to


the Star, but when dimensions have very
high cardinality, the sub-selects in the
dimension tables can be a source of delay.
An alternative is to normalize the dimension
tables by attribute level, with each smaller
dimension table pointing to an appropriate
aggregated fact table, the “Snowflake
Schema” ...
Sudarshan
Copyright © 1995-1996 Archer Decision Sciences, Inc.
The “Snowflake” Schema
Store Dimension
STORE KEY District_ID Region_ID
Store Description District Desc. Region Desc.
City Region_ID Regional Mgr.
State
District ID
District Desc.
Region_ID
Region Desc.
Regional Mgr.
Store Fact Table District Fact Table RegionFact Table
Region_ID
STORE KEY District_ID
PRODUCT_KEY
PRODUCT_KEY PERIOD_KEY
PRODUCT KEY PERIOD_KEY Dollars
PERIOD KEY Dollars Units
Units Price
Dollars Price
Units
Price

Sudarshan
Copyright © 1995-1996 Archer Decision Sciences, Inc.
The “Snowflake” Schema
Store Dimens ion No LEVEL in dimension tables
STORE KEY Dis trict_ID Region_ID
Store Des criptio n
City
Dis trict Des c .
Re gion_ID
Reg ion Des c.
Reg ional Mgr.
Dimension tables are normalized by
State
Dis trict ID
decomposing at the attribute level
Dis trict Des c .
Re gion_ID
Re gion Des c.
Each dimension table has one key for
Store Fact Table
Re gional Mg r.
Dis tric t Fact Table
District_ID
Re gionFac t Table
Region_ID each level of the dimension’s
STORE KEY PRODUCT_KEY
PRODUCT KEY
PRODUCT_KEY
PERIOD_KEY
PERIOD_KEY
Do llars
hierarchy
PERIOD KEY Do llars Units
Price
Dollars
Units
Pric e The lowest level key joins the
Units
Price dimension table to both the fact table
and the lower level attribute table

How does it work? The best way is for the query to be built by understanding
which summary levels exist, and finding the proper snowflaked attribute
tables, constraining there for keys, then select’ing from the fact table.

Sudarshan
Copyright © 1995-1996 Archer Decision Sciences, Inc.
The “Snowflake” Schema
Store Dimens ion Additional features: The original Store
STORE KEY Dis trict_ID Region_ID
Store Des criptio n Dis trict Des c . Reg ion Des c.
Dimension table, completely de-
City
State
Re gion_ID Reg ional Mgr. normalized, is kept intact, since
Dis trict ID
Dis trict Des c .
certain queries can benefit by its all-
Re gion_ID
Re gion Des c.
encompassing content.
Re gional Mg r.
Store Fact Table Dis tric t Fact Table Re gionFac t Table

STORE KEY District_ID


PRODUCT_KEY
Region_ID
PRODUCT_KEY
PERIOD_KEY
In practice, start with a Star Schema
PRODUCT KEY
PERIOD KEY
PERIOD_KEY
Do llars
Do llars
Units and create the “snowflakes” with
Price
Dollars
Units
Pric e queries. This eliminates the need to
Units
Price create separate extracts for each
table, and referential integrity is
inherited from the dimension table.

Advantage: Best performance when queries involve aggregation

Disadvantage: Complicated maintenance and metadata, explosion in the number


of tables in the database
Sudarshan
Copyright © 1995-1996 Archer Decision Sciences, Inc.
Example of a Star Schema
Order
Product
Order No ProductNO
Order Date ProdName
Fact Table ProdDescr
Customer
OrderNO Category
Customer No CategoryDescription
SalespersonID
Customer Name UnitPrice
CustomerNO
Customer
Address ProdNo Date
City
DateKey DateKey

CityName Date
Salesperson
Quantity City
SalespersonID
SalespersonName Total Price
CityName
City
State
Quota
Country

Sudarshan
Star Schema
• A single fact table and a single table for each
dimension
• Every fact points to one tuple in each of the
dimensions and has additional attributes
• Does not capture hierarchies directly
• Straightforward means of capturing a multiple
dimension data model using relations

Sudarshan
Example of a Snowflake Schema
Order
Product
Order No Category
ProductNO
Order Date ProdName CategoryName
Fact Table
ProdDescr CategoryDescr
Customer
Category
OrderNO
Customer No Category
Customer Name SalespersonID
UnitPrice
Customer CustomerNO
Address Date
ProdNo Month
City DateKey
DateKey Month
Date
Salesperson CityName Year
Year
Month Year
SalespersonID Quantity City
SalespersonName State
Total Price CityName
City StateName
State
Quota Country
Country

Sudarshan
Snowflake Schema
• Represent dimensional hierarchy directly
by normalizing the dimension tables
• Easy to maintain
• Saves storage, but may reduce
effectiveness of browsing (Kimball)

Sudarshan
Fact Constellation
Sales Shipping
Fact Table Fact Table
Store Key Product Dimension
Shipper Key
Product Key Product Key Store Key
Period Key Product Desc Product Key
Units
Period Key
Price
Units
Price
Store Dimension

Store Key
Store Name
City
State
Region

Sudarshan
Fact Constellation
• Multiple fact tables share dimension tables.
• This schema is viewed as collection of
stars hence called galaxy schema or fact
constellation.
• Sophisticated applications require such
schema.

Sudarshan
Data Warehouse vs. Data Marts
• Enterprise warehouse: collects all
information about subjects (customers,
products, sales, assets, personnel) that span
the entire organization.
– Requires extensive business modeling
– May take years to design and build
• Data Marts: departmental subsets that focus
on selected subjects: Marketing data mart:
customer, products, sales.
– Faster roll out, but complex integration in
the long run.

Sudarshan
Extraction, Transformation, &
Load (ETL)
 ETL is a set of tools and techniques
used to populate a data warehouse
 Extraction
 Extract data from sources (e.g., operational
DBMSs, file systems, Web pages)
 Transformation
 Clean data
 Convert from legacy/host format to
warehouse format (e.g., convert “surname”
to “last name”)

Sudarshan
Extraction, Transformation, &
Load (ETL)
 Load
 Sort, summarize, consolidate, compute views, check
integrity, build indexes, partition
 Huge volumes of data to be loaded, yet small time window
(usually at night) when the warehouse can be taken off-line
 Techniques: batch, sequential load often too slow;
incremental, parallel loading techniques may be used

 Refresh
 Propagate updates from sources to the warehouse
 When to refresh - on every update, periodically (e.g., every
24 hours), or after “significant” events
 How to refresh – full extract from base tables vs. incremental
techniques

Sudarshan
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
Sudarshan
CSP002N-week2 11
Conceptual Modeling of Data Warehouses

Modeling data warehouses: dimensions & measures


 Star schema: A fact table in the middle connected to a
set of dimension tables
 Snowflake schema: A refinement of star schema
where some dimensional hierarchy is normalized into a
set of smaller dimension tables, forming a shape
similar to snowflake
 Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation

CSP002N-week2 15

Sudarshan
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures

CSP002N-week2 16

Sudarshan
Example of Snowflake Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key

branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type dollars_sold
city_key
avg_sales city
province_or_street
Measures country

CSP002N-week2 17

Sudarshan
Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location

branch location_key location to_location


branch_key location_key dollars_cost
branch_name
units_sold
street
branch_type dollars_sold city
units_shipped
province_or_street
avg_sales country shipper
Measures shipper_key
shipper_name
location_key
CSP002N-week2 shipper_type 18

Sudarshan