Sie sind auf Seite 1von 59

1

DW Concepts Dimension Modeling Techniques

www.technologica.com

TechnoLogica DW Projects

Business Management System National Health Insurance Fund (10.2004 current)


Customer Data Integration Allianz Bulgaria Holding (10.2004 current) Regulatory Reporting System BULBANK (2002 - 2003) Information System Monetary Statistics Bulgarian National Bank (April 2003 August 2004)

Management Information System BULBANK (January 2001 - June 2002)

www.technologica.com

Agenda

DW Terminology Overview Dimensional Modeling Dimension Types

History and Dimensions


Hierarchy in Dimensions

www.technologica.com

The data warehouse must

Make an organizations information easily accessible.


Present the organizations information consistently. Be adaptive and resilient to change

Be a secure bastion that protects our information assets.


Serve as the foundation for improved decision making

The business community must accept the data warehouse if it is to be deemed successful.

www.technologica.com

Components of a Data Warehouse

www.technologica.com

Dimensional Modeling
Dimensional

modeling is a new name for an old technique for making databases simple and understandable modeling is quite different from thirdnormal-form (3NF) modeling
DM -> The data warehousing model
o

Dimensional

ERM ->The Transaction Processing Model


o o o

One table per entity


Minimize data redundancy Optimize update

One fact table for a process in the organization


Maximize understandability Optimized for retrieval Resilient to change
www.technologica.com

o o o

Star Dimensional Modeling

History (Dimension table)

Order Item_nbr Item_desc Quantity Discnt_price Unit_price Order_amount


(Fact table)

Product (Dimension table)

Customer (Dimension table)

Channel (Dimension table)

www.technologica.com

Four-Step Dimensional Design Process

1. 2. 3.

Select the business process to model. Declare the grain of the business process.

Choose the dimensions that apply to each fact table row. Identify the numeric facts that will populate each fact table row.

4.

www.technologica.com

Dimensions
Determine

these by the ways you want to slice and dice the data number of rows compared to facts

Small

Usually
Time

5-10 dimensions surrounding a fact table

is almost always a dimension used by every fact history Surrogate Keys

Track Uses

Hierarchies

are usually built into them if possible


www.technologica.com

10

Date Dimension
The

date dimension is the one dimension nearly guaranteed to be in every data mart

Date
We

Dimension = Time Dimension before

can build the date dimension table in advance (5-10 years -> only 3,650 rows)

www.technologica.com

Date Dimension

11

www.technologica.com

12

Date Dimension

www.technologica.com

13

Date Dimension
Data warehouses always need an explicit date dimension table. There are many date attributes not supported by the SQL date function, including fiscal periods, seasons, holidays, and weekends. Rather than attempting to determine these nonstandard calendar calculations in a query, we should look them up in a date dimension table. select sum(f.amount_sold) from DATE_DIM d, FACT f where d.Calendar_Month = January and d.id = f.date_dim_id;

www.technologica.com

Dimension Normalization (Denormalized dimension)

14

www.technologica.com

Dimension Normalization (Denormalized dimension)

15

www.technologica.com

Dimension Normalization (Snowflaking)

16

www.technologica.com

Dimension Normalization (Snowflaking)

17

The

dimension tables should remain as flat tables physically.


snowflaked dimension tables penalize cross-attribute browsing and prohibit the use of bit-mapped indexes. space savings gained by normalizing the dimension tables typically are less than 1 percent of the total disk space needed for the overall schema

Normalized,

Disk

www.technologica.com

18

Too Many Dimensions

www.technologica.com

19

Too Many Dimensions

very large number of dimensions typically is a sign that several dimensions are not completely independent and should be combined into a single dimension. our design has 25 or more dimensions, we should look for ways to combine correlated dimensions into a single dimension
is a dimensional modeling mistake to represent elements of a hierarchy as separate dimensions in the fact table.

If

It

www.technologica.com

20

Surrogate Keys
Every

join between dimension and fact tables in the data warehouse should be based on meaningless integer surrogate keys. should avoid using the natural operational production codes. None of the data warehouse keys should be smart, where you can tell something about the row just by looking at the key.

You

www.technologica.com

21

Surrogate Keys
Surrogate keys are like an immunization for the data warehouse
Buffer

the data warehouse environment from operational changes

Performance advantages The smaller surrogate key translates into smaller fact tables, smaller fact table indices, and more fact table rows per block input-output operation Surrogate

keys are used to record dimension conditions that may not have an operational code
No Promotion in Effect, Date Not Applicable.

www.technologica.com

22

Surrogate Keys
The

date dimension is the one dimension where surrogate keys should be assigned in a meaningful, sequential order keys are needed to support one of the primary techniques for handling changes to dimension table attributes use concatenated or compound keys for dimension tables

Surrogate

Dont

www.technologica.com

23

Data Warehouse Bus Architecture

www.technologica.com

24

Data Warehouse Bus Matrix

www.technologica.com

25

Conformed Dimensions
Most

dimensions are defined naturally at the most granular level possible

Conformed

dimensions are either identical or strict mathematical subsets of the most granular, detailed dimension
have consistent dimension keys, consistent attribute column names, consistent attribute definitions, and consistent attribute values conformed dimension may be the same physical table within the database or may be duplicated synchronously in each data mart

They

The

www.technologica.com

26

Conformed Dimensions
Roll-up

dimensions conform to the base-level atomic dimension if they are a strict subset of that atomic dimension.

www.technologica.com

27

Conformed Dimensions
They They

should be built once in the staging area

must be published prior to staging of the fact data dimension authority has responsibility for defining, maintaining, and publishing a particular dimension or its subsets to all the data mart clients who need it

The

www.technologica.com

28

Tracking History in Dimensions


Unchanging Changing,

Dimensions

but Original Values are Irrelevant

A phone number in a customer record


Slowly

Changing Dimensions (SCD)

A customer address, manager


Rapidly

Changing Dimensions
Changing Dimensions

Income range of a customer


Continuously

Customer age

www.technologica.com

29

Type 1: Overwrite the Value


The

type 1 response is easy to implement, but:

it does not maintain any history of prior attribute values any preexisting aggregations based on the department value will need to be rebuilt

www.technologica.com

30

Type 2: Add a Dimension Row


The

type 2 response is the primary technique for accurately tracking slowly changing dimension attributes. It is extremely powerful because the new dimension row automatically partitions history in the fact table. not suitable for dimension tables that already exceed a million rows

Its

www.technologica.com

31

Type 2: Add a Dimension Row


Product Key 12345 25984 Product Description IntelliKidz 1.0 IntelliKidz 1.0 SKU Number ABC922-Z ABC922-Z Effective Date 01.1.1900 23.4.2005 Most Resent Flag N Y

Department Education Strategy

Product Key 12345 25984

Product Description IntelliKidz 1.0 IntelliKidz 1.0

Department Education Strategy

SKU Number ABC922-Z ABC922-Z

Effective Date 01.1.1900 23.4.2005

Expiration Date 22.4.2005 01.1.2500

Product Key 12345 25984

Date Key 200 203

Amount Sold 100 200 <--- 20.04.2005 <--- 23.04.2005

www.technologica.com

32

Type 3: Add a Dimension Column


The

type 3 slowly changing dimension technique allows us to see new and historical fact data by either the new or prior attribute values.

www.technologica.com

Hybrid SCD Techniques Series of Type 3 Attributes Predictable Changes with Multiple Version Overlays

33

Report each years sales using the district map for that year.
Report each years sales using a district map from an arbitrary different year. Report an arbitrary span of years sales using a single district map from any chosen year. The most common version of this requirement would be to report the complete span of fact data using the current district map.
www.technologica.com

Hybrid SCD Techniques Type 2 with "Current" Overwrite Unpredictable Changes with Single-Version Overlay
preserves historical accuracy while supporting the ability to report historical data according to the current values

34

www.technologica.com

35

Dimension Table Staging

www.technologica.com

36

Dimension Table Staging

www.technologica.com

38

Junk Dimensions
What

to do with flags and indicators

Leave the flags and indicators unchanged in the fact table row. Make each flag and indicator into its own separate dimension Strip out all the flags and indicators from the design.

junk dimension is a convenient grouping of typically low-cardinality flags and indicators

www.technologica.com

39

Junk Dimensions

Whether to use junk dimension


indicators, each has 3 values -> 243 (35) rows 5 indicators, each has 100 values -> 100 million (1005) rows
5

When to insert rows in the dimension


www.technologica.com

40

Multiple Currencies

www.technologica.com

41

Customer Dimension
Critical The

element for effective CRM

most challenging dimension for any data warehouse


extremely deep (with millions of rows)
extremely wide (with dozens or even hundreds of attributes) sometimes subject to rather rapid change

www.technologica.com

Customer Dimension Name and Address Parsing

42

www.technologica.com

Customer Dimension Other Common Customer Attributes


Gender Ethnicity Age

43

or other life-stage classifications or other lifestyle classifications source market segment (for example, new, active, inactive, closed)

Income Status

Referring

Business-specific Scores

characterizing the customer, such as purchase behavior, payment behavior, product preferences
www.technologica.com

Customer Dimension Aggregated Facts as Attributes


These

44

attributes are to be used for constraining and labeling; they are not to be used in numeric calculations on those which will be used frequently

Focus

Minimize

the frequency with which these attributes need to be updated

Replace

metrics with more meaningful descriptive values, such as High Spender

www.technologica.com

Dimension Outriggers for a Low-Cardinality Attribute Set

45

www.technologica.com

Rapidly Changing Customer Dimensions

46

Challenges

It generally takes too long to constrain or browse among the relationships in such a big table It is difficult to use previously described techniques for tracking changes in these large dimensions

One solution is to break off frequently analyzed or frequently changing attributes into a separate dimension, referred to as a minidimension

www.technologica.com

Rapidly Changing Customer Dimensions The Mini Dimension with "Current" Overwrite

47

www.technologica.com

Rapidly Changing Customer Dimensions


The

48

minidimension terminology refers to when the demographics key is part of the fact table composite key the demographics key is a foreign key in the customer dimension, we refer to it as an outrigger

If

www.technologica.com

Rapidly Changing Customer Dimensions Type 2 with Natural Keys in Fact Table
Customer Dimension - Current Attributes (SCD1) Customer ID (Natural Key) Customer Name Customer Address Customer Date of Birth Customer Date of 1st Order Age Gender Annual Income Number of Children Marital Status Fact Table Customer Key (FK) Customer Demographics Key (FK) More Foreign Keys Facts

49

Customer Dimension - "As was" Attributes (SCD2) Customer Key (PK) Customer ID (Natural Key) Customer Name Customer Address Customer Date of Birth Customer Date of 1st Order Age Gender Annual Income Number of Children Marital Status

www.technologica.com

Implications of Type 2 Customer Dimension Changes


Be

50

careful to avoid overcounting because we may have multiple rows in the customer dimension for the same individual

COUNT DISTINCT A most recent row indicator

The

comparison operators depend on the business rules used to set our effective/expiration dates.

www.technologica.com

51

Customer Behavior Study Groups


Capture

the keys of the customers or products whose behavior you are tracking

www.technologica.com

52

Commercial Customer Hierarchies

www.technologica.com

53

Commercial Customer Hierarchies

Bridge

tables

www.technologica.com

54

Commercial Customer Hierarchies

www.technologica.com

55

Commercial Customer Hierarchies


Be aware of risk of double counting SELECT 'San Francisco', SUM(F.REVENUE) FROM FACT F, DATE D WHERE F.CUSTOMER_KEY IN (SELECT B.SUBSIDIARY_KEY FROM CUSTOMER C, BRIDGE B WHERE C.CUSTOMER_KEY = B.PARENT_KEY AND C.CUSTOMER_CITY = 'San Francisco') //to sum all SF parents AND F.DATE_KEY = D.DATE_KEY AND D.MONTH = 'January 2002 GROUP BY 'San Francisco'
www.technologica.com

56

Heterogeneous Product Schemas

www.technologica.com

57

Heterogeneous Product Schemas

www.technologica.com

Common Dimensional Modeling Mistakes to Avoid


Mistake

58

10: Place text attributes used for constraining and grouping in a fact table 9: Limit verbose descriptive attributes in dimensions to save space 8: Split hierarchies and hierarchy levels into multiple dimensions

Mistake

Mistake

Mistake

7: Ignore the need to track dimension attribute changes


6: Solve all query performance problems by adding more hardware
www.technologica.com

Mistake

Common Dimensional Modeling Mistakes to Avoid


Mistake

59

5: Use operational or smart keys to join dimension tables to a fact table 4: Neglect to declare and then comply with the fact tables grain 3: Design the dimensional model based on a specific report

Mistake

Mistake

Mistake

2: Expect users to query the lowest-level atomic data in a normalized forma


1: Fail to conform facts and dimensions across separate fact tables
www.technologica.com

Mistake

60

Questions and Answers


www.technologica.com

Das könnte Ihnen auch gefallen