Sie sind auf Seite 1von 9

(http://www.kimballgroup.

com)

Kimball Univ 2015 Calendar


(http://www.kimballgroup.com/datawarehouse-businessintelligence-courses/schedule/)

A Trio of Interesting Snowakes


Home (http://www.kimballgroup.com) / A Trio of Interesting Snowflakes

By Ralph Kimball (http://www.kimballgroup.com/author/ralph/)


June 29, 2001
Kimball Group. All rights reserved.

Beat three common modeling challenges with extensions of the


dimensional model
When can I use a snowflake? is a question data warehouse designers have asked me
hundreds of times. I usually answer that its a bad idea to expose the end users to a physical
snowflake design, because it almost always compromises understandability and performance.
But in certain situations a snowflake design is not only acceptable, but recommended.

CLASSIC SNOWFLAKE
The way to create a classic snowflake, let us remind ourselves, is to remove low cardinality
attributes from a dimension table and place these attributes in a secondary dimension table
connected by a snowflake key. In cases where a set of attributes form a multilevel hierarchy,
the resulting string of tables looks a little like a snowflake hence the name.
A classic physical snowflake design may be useful in the backroom staging area as a way to
enforce the many-to-one relationships in a dimension table. But in the front room presentation
part of your data warehouse, you have to demonstrate to me that the end users find the
snowflake easier to understand and, moreover, that queries and reports run faster with the
snowflake, before I am comfortable with the snowflake design.

snowflake, before I am comfortable with the snowflake design.


But having issued this warning, I have found three cases where variations on a snowflake are
not only acceptable, but are the keys to a successful design.

LARGE CUSTOMER DIMENSIONS


The customer dimension is probably the most challenging dimension in a data warehouse. In a
large organization, the customer dimension can be huge, with millions of records, and wide,
with dozens of attributes.
To make matters worse, the biggest customer dimensions commonly contain two categories of
customers, which I will call visitor and customer.
Visitors are anonymous. You may see them more than once, but you dont know their names or
anything else about them. On a Web site, the only knowledge you have about visitors is a
cookie indicating they have returned. In a retail operation, a visitor engages in an anonymous
transaction.
Customers, conversely, are reliably registered with your company. You know customers
names, addresses, and as much demographic and historical data as you care to elicit directly
from them or purchase from third parties.
Let us assume that at the most granular level of your data collection, 80 percent of the fact
table measurements involve visitors and 20 percent involve customers. You accumulate just
two simple behavior scores for visitors consisting only of recency (when they last visited you)
and frequency (how many times they have visited).
On the other hand, let us assume you have 50 attributes and measures for a customer,
covering all the components of location, payment behavior, credit behavior, directly elicited
demographic attributes, and purchased demographic attributes.
Now you combine visitors and customers into a single logical dimension called shopper. You
give the visitor or customer a single, permanent shopper ID, but make the key to the table a
surrogate key so that you can track changes to the shopper over time. Logically, the shopper
dimension has the following attributes.
The attributes for both visitors and customers are:
Shopper surrogate key
Shopper ID (fixed ID for each physical shopper)
Recency
Frequency.
Attributes for customers only are:
Five name attributes
10 location attributes
10 behavior attributes
25 demographic attributes.

25 demographic attributes.
Note the importance of including the recency and frequency information as dimensional
attributes rather than as facts and overwriting them as time progresses. This decision makes
the shopper dimension very powerful. You can do classic shopper segmentation directly off the
dimension without navigating a fact table in a complex application. See the discussion of this
kind of segmentation in my book, The Data Webhouse Toolkit, starting on page 73.
Assuming that many of the final 50 customer attributes are textual, you could have a total
record width of 500 bytes or more. Suppose you have 20 million shoppers (16 million visitors
and four million registered customers). Obviously, you are worried that in 80 percent of your
records, the trailing 50 fields contain no data! In a 10GB dimension, this condition gets your
attention.
This is a clear case where, depending on the database, you want to introduce a snowflake. You
should break the dimension into a base dimension and a snowflake subdimension. All the
visitors share a single record in the subdimension, which contains special null attribute values.
(See FIGURE 1.)
Figure 1.1 A Shopper dimension where 80 percent of the records have 50 null attributes. [A trio
of interesting snowflakes Fig1.gif]
In a fixed-width database, using our previous assumptions, the base shopper dimension is 20
million x 25 bytes=500MB, and the snowflake dimension is 4 million x 475 bytes=1.9GB. You
save 8GB by using the snowflake. If you have a query tool that insists on a classic star schema
with no snowflakes, then you can hide the snowflake under a view declaration.

FINANCIAL PRODUCT DIMENSIONS


Banks, brokerage houses, and insurance companies all have trouble modeling their product
dimensions because each of the individual products has a host of special attributes not shared
by other products. Except for a set of common core
attributes, a checking account
doesnt look very much like a mortgage or certificate of deposit. They even have different
numbers of attributes.
If you try to build a single product dimension with the union of all possible attributes, you end
up with hundreds of attributes, most of which are empty in a given record.
The answer in this case is to build a context-dependent snowflake. You isolate the core
attributes in a base product dimension table, and include a snowflake key in each base record
that points to its proper extended product subdimension. (See Figure 2.)
Figure 1.2 A Financial Product dimension with a subdimension for each product type. [A trio of
interesting snowflakes Fig2.gif]
This solution is not a conventional relational join! The snowflake key must connect to the
particular subdimension table that a specific product type defines. Usually you can accomplish
this task by constructing a relational view for each product type that hardwires the correct join
path.

MULTIENTERPRISE CALENDAR DIMENSIONS


Building a calendar dimension in a distributed data warehouse spanning multiple organizations
is difficult because each organization has idiosyncratic fiscal periods, seasons, and holidays.
Although you should make a heroic effort to reduce incompatible calendar labels, many times

is difficult because each organization has idiosyncratic fiscal periods, seasons, and holidays.
Although you should make a heroic effort to reduce incompatible calendar labels, many times
you want to look at the overall multienterprise data through the eyes of just one of the
organizations.
Unlike the financial products dimensions, each of the separate calendars can have the same
number of attributes describing fiscal periods, seasons, and holidays. But there may be
hundreds of separate calendars. An international retailer may have to deal with a calendar for
each foreign country.
In this case you modify the snowflake design to let the snowflake key join to a single calendar
subdimension. (See Figure 3.) But the subdimension has higher cardinality than the base
dimension! The key for the subdimension is both the snowflake key and the organization key.
Figure 1.3 A Calendar dimension with a higher cardinality subdimension. [A trio of interesting
snowflakes Fig3.gif]
In this situation, you must specify a single organization in the subdimension before evaluating
the join between the tables. When done correctly, the subdimension has a one-to-one
relationship with the base dimension as if the two tables were a single entity. Now the entire
multienterprise data warehouse can be queried through the calendar of any constituent
organization.

PERMISSIBLE SNOWFLAKES
These three examples show how variations of snowflake designs can be very useful. I hope you
feel more confident about answering the question: When can I use a snowflake? When you are
thinking about design alternatives, you should separate the issues of physical design from
those of logical design. Physical design drives performance. Logical design drives
understandability. You can certainly use snowflake designs if you maximize both of these goals.

Share this:

About the Author: Ralph Kimball


(http://www.kimballgroup.com/author/ralph/)
(mailto:ralph@kimballgroup.com)
Ralph Kimball is the founder of the Kimball Group and Kimball
University where he has taught data warehouse design to more
than 10,000 students. He is known for the best selling series of
Toolkit books. He started with a Ph.D. in man-machine systems
from Stanford in 1973 and has spent nearly four decades
designing systems for users that are simple and fast.

(http://www.kimballgroup.com/author/ralph/)

Subscribe to Design Tips (http://www.kimballgroup.com/subscribe-to-designtips/)

Categories
Kimball Classics (http://www.kimballgroup.com/category/kimball-classics/)
Before Diving In (http://www.kimballgroup.com/category/before-diving-in/)
Project/Program Planning (http://www.kimballgroup.com/category/project-programplanning/)
Requirements Definition (http://www.kimballgroup.com/category/requirements-definition/)
Data Architecture (http://www.kimballgroup.com/category/data-architecture/)
Dimensional Modeling Fundamentals
(http://www.kimballgroup.com/category/dimensional-modeling-fundamentals/)
Dimensional Modeling Tasks (http://www.kimballgroup.com/category/dimensionalmodeling-tasks/)
Fact Table Core Concepts (http://www.kimballgroup.com/category/fact-table-coreconcepts/)
Dimension Table Core Concepts (http://www.kimballgroup.com/category/dimension-tablecore-concepts/)
Advanced Dimension Patterns & Case Studies
(http://www.kimballgroup.com/category/advanced-dimension-patterns-case-studies/)
ETL and Data Quality (http://www.kimballgroup.com/category/etl-and-data-quality/)
Technical Architecture (http://www.kimballgroup.com/category/technical-architecture/)
Business Intelligence Applications (http://www.kimballgroup.com/category/businessintelligence-applications/)

Maintenance and Growth (http://www.kimballgroup.com/category/maintenance-andgrowth/)

Archives
(http://www.kimballgroup.com/2015/)
2015 (http://www.kimballgroup.com/2015/)
(http://www.kimballgroup.com/2014/)
2014 (http://www.kimballgroup.com/2014/)
(http://www.kimballgroup.com/2013/)
2013 (http://www.kimballgroup.com/2013/)
(http://www.kimballgroup.com/2012/)
2012 (http://www.kimballgroup.com/2012/)
(http://www.kimballgroup.com/2011/)
2011 (http://www.kimballgroup.com/2011/)
(http://www.kimballgroup.com/2010/)
2010 (http://www.kimballgroup.com/2010/)
(http://www.kimballgroup.com/2009/)
2009 (http://www.kimballgroup.com/2009/)
(http://www.kimballgroup.com/2008/)
2008 (http://www.kimballgroup.com/2008/)
(http://www.kimballgroup.com/2007/)
2007 (http://www.kimballgroup.com/2007/)
(http://www.kimballgroup.com/2006/)
2006 (http://www.kimballgroup.com/2006/)
(http://www.kimballgroup.com/2005/)
2005 (http://www.kimballgroup.com/2005/)
(http://www.kimballgroup.com/2004/)
2004 (http://www.kimballgroup.com/2004/)
(http://www.kimballgroup.com/2003/)
2003 (http://www.kimballgroup.com/2003/)
(http://www.kimballgroup.com/2002/)
2002 (http://www.kimballgroup.com/2002/)
(http://www.kimballgroup.com/2001/)
2001 (http://www.kimballgroup.com/2001/)

(http://www.kimballgroup.com/2000/)
2000 (http://www.kimballgroup.com/2000/)
(http://www.kimballgroup.com/1999/)
1999 (http://www.kimballgroup.com/1999/)
(http://www.kimballgroup.com/1998/)
1998 (http://www.kimballgroup.com/1998/)
(http://www.kimballgroup.com/1997/)
1997 (http://www.kimballgroup.com/1997/)
(http://www.kimballgroup.com/1996/)
1996 (http://www.kimballgroup.com/1996/)
(http://www.kimballgroup.com/1995/)
1995 (http://www.kimballgroup.com/1995/)

Latest Design Tips


Design Tip #177 Final Word of the Day: Collaboration
(http://www.kimballgroup.com/2015/09/design-tip-177-final-word-of-the-daycollaboration/)
Design Tip #176 Dimensional Models Logical or Physical?
(http://www.kimballgroup.com/2015/07/design-tip-176-dimensional-models-logical-orphysical/)
Design Tip #175 There is No Database Magic
(http://www.kimballgroup.com/2015/06/design-tip-175-there-is-no-database-magic/)
Design Tip #174 Does Your Organization Need an Analytic Sandbox?
(http://www.kimballgroup.com/2015/05/design-tip-174-does-your-organization-need-ananalytic-sandbox/)
Design Tip #173 Risky Project Resources are Risky Business
(http://www.kimballgroup.com/2015/03/design-tip-173-risky-project-resources-riskybusiness/)

Consulting
DW/BI Strategy (http://www.kimballgroup.com/data-warehouse-business-intelligenceconsulting/dw-bi-strategy-assessment/)

DW/BI Requirements (http://www.kimballgroup.com/data-warehouse-businessintelligence-consulting/business-intelligence-requirements/)


Dimensional Modeling (http://www.kimballgroup.com/data-warehouse-businessintelligence-consulting/dimensional-modeling/)
Dimensional Model Design Review (http://www.kimballgroup.com/data-warehousebusiness-intelligence-consulting/dimensional-model-design-review/)
DW/BI Project Review (http://www.kimballgroup.com/data-warehouse-businessintelligence-consulting/business-intelligence-project-review/)

Training
Public Course Descriptions (http://www.kimballgroup.com/data-warehouse-businessintelligence-courses/education-training-classes/)
Class Schedule (http://www.kimballgroup.com/data-warehouse-business-intelligencecourses/schedule/)
Logistics (http://www.kimballgroup.com/data-warehouse-business-intelligencecourses/logistics/)
Pricing and Policies (http://www.kimballgroup.com/data-warehouse-businessintelligence-courses/pricing-and-policies/)
Registration (http://www.kimballgroup.com/data-warehouse-business-intelligencecourses/registration/)
Onsite Course Descriptions (http://www.kimballgroup.com/data-warehouse-businessintelligence-courses/on-site-education-training-classes/)
LinkedIn for Alumni (http://www.kimballgroup.com/data-warehouse-businessintelligence-courses/linkedin-for-alumni/)

Resources
Kimball Techniques (http://www.kimballgroup.com/data-warehouse-businessintelligence-resources/kimball-techniques/)
Books (http://www.kimballgroup.com/data-warehouse-business-intelligenceresources/books/)
Design Tips (http://www.kimballgroup.com/category/design-tips/)
Articles & Papers (http://www.kimballgroup.com/category/business-intelligence-anddata-warehouse-articles/)
Events (http://www.kimballgroup.com/data-warehouse-business-intelligenceresources/events/)

Forum (http://forum.kimballgroup.com/)

The Kimball Group. All Rights Reserved. Spark Logix Studios (http://www.sparklogix.com/)

Das könnte Ihnen auch gefallen