Snowflake or Flatten in A DM

Understanding Role of Dimensions
Should I snowflake my design or is it going to be a good or bad idea to

snowflake?
When parent-child hierarchies are propagated from the OLTP application, there comes a question if they
could take similar relationships at the data warehouse too. Traditionally there are two different kinds of
approaches when trying to construct a dimensional data mode. We either build the dimensional model
using snowflake schema or the start schema. In most of the cases, star schema is widely used due to its
ease of use. Avoid snowflake and outrigger as best as possible when you plan to develop a start model.
Sometimes there comes a situation where snowflake becomes inevitable, and therefore has to be used
sparsely.
When data sources show interrelated master tables which show parent-child hierarchical relationships
at the dimensional model, flattening of the dimensional hierarchies may sometimes solve snowflake
problem but it again depends on how the slice and dice is being performed by the user.
An example could be like three master tables; ProductCategory, ProductSubCategory and Product
(GrandParent, Parent and Child respectively) in the advenureworks database. This may seem to
snowflake back in the DM since you see a heirarchy, and flattening the relationships requires;
1.
Generate a unique natural key (which could be a combination of natural keys of the Product +
ProductCategory + ProductSubCategory)
2. When hierarchies are flattened, the ProductCategory appears to have duplicated for Products
belonging to list of products (Imagine Country, State and City where if you list the Cities
belonging to a specific country the Country names appears repeated in the result or in the
flattened table). But at the dimensional granular level the grain appears correct. So unless my
reports always slice Products first by ProductCategory and ProductSubCategory flattening makes
sense or else it may introduce new challenges when trying to represent the factual information
using a single dimension key (which in this case is a combination of 3 dimensions).
Sometimes Product is not always represented by a ProductCategory or by a ProductSubCategory, one
way to tackle this problem is to either introduce outrigger dimensions or bridge tables. The other better
way is to correlate such multiple hierarchical dimensions demoted into a fact table, where all three
dimension keys (surrogate keys of 3 different dimensions DimProduct, DimProductCategory and
DimProductSubCategory) are represented as foreign keys in the Fact table.
This idea raises a question about How do I navigate across dimensional hierarchies if represented in
Fact table only? I dont want to touch the Fact table every time I want to list the ProductSubCategory for
a given ProductCategory OR in situations where I want to list the Products for a given
ProductSubCategory, it may take long time since Facts are going to be humongous?
Like mentioned above, this problem can be solved using by introducing bridge tables, meaning, you
maintain the bridge table that serves a mapping table between these 3 tables using which you can
determine the hierarchy between these 3 dimensions. And, at the same time, you also maintain the
surrogate keys of these 3 Dims in the Fact table too.
What is the better way of maintaining historical data when the history is sparsely used in the reporting
system, or used just for operational purposes?
Historical data on the dimensions; it is recommend maintaining the history information on a different
table; For example consider the case where some of the attributes in the Product table require a type-2
change to be tracked, following is how it could be implemented;
Here is how the ETL could load the data into these tables when an update is triggered by the ETL
subsystem;
1.
First time when a product is inserted into the DimProduct a new surrogate key is generated
represented by Product_Key_SK.
2. An update on the non-key fields on the DimProduct, does the following actions;
a. Copy the existing record from DimProduct to DimProduct_History (before performing
UPDATE on DimProduct)
b. Set the StartDate and EndDate and Version number in the History table
c. Update the Dim product on new changes
Advantages:
If the history maintained in the same dimension table, every time an update is triggered requires new
row to be added into the dimension table. This means that the surrogate key changes on every update
action, this means you will also have to update the surrogate keys in all bridge and fact tables with the
new one. With the above strategy, we dont have to update any bridge or fact tables because the
surrogate key in the dimension table never changes.
Drawbacks
If the historical data is to be pulled more frequently, you would need to join the DimProduct and the
DimProduct_History tables, which I think wont really cause great performance issues if you maintain
proper indexes.

Snowflake or Flatten in A DM

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Snowflake or Flatten in A DM

Hochgeladen von

Copyright:

Verfügbare Formate

Understanding Role of Dimensions

Should I snowflake my design or is it going to be a good or bad idea to

Das könnte Ihnen auch gefallen