Beruflich Dokumente
Kultur Dokumente
www.technologica.com
TechnoLogica DW Projects
www.technologica.com
Agenda
www.technologica.com
The business community must accept the data warehouse if it is to be deemed successful.
www.technologica.com
www.technologica.com
Dimensional Modeling
Dimensional
modeling is a new name for an old technique for making databases simple and understandable modeling is quite different from thirdnormal-form (3NF) modeling
DM -> The data warehousing model
o
Dimensional
o o o
www.technologica.com
1. 2. 3.
Select the business process to model. Declare the grain of the business process.
Choose the dimensions that apply to each fact table row. Identify the numeric facts that will populate each fact table row.
4.
www.technologica.com
Dimensions
Determine
these by the ways you want to slice and dice the data number of rows compared to facts
Small
Usually
Time
Track Uses
Hierarchies
10
Date Dimension
The
date dimension is the one dimension nearly guaranteed to be in every data mart
Date
We
can build the date dimension table in advance (5-10 years -> only 3,650 rows)
www.technologica.com
Date Dimension
11
www.technologica.com
12
Date Dimension
www.technologica.com
13
Date Dimension
Data warehouses always need an explicit date dimension table. There are many date attributes not supported by the SQL date function, including fiscal periods, seasons, holidays, and weekends. Rather than attempting to determine these nonstandard calendar calculations in a query, we should look them up in a date dimension table. select sum(f.amount_sold) from DATE_DIM d, FACT f where d.Calendar_Month = January and d.id = f.date_dim_id;
www.technologica.com
14
www.technologica.com
15
www.technologica.com
16
www.technologica.com
17
The
Normalized,
Disk
www.technologica.com
18
www.technologica.com
19
very large number of dimensions typically is a sign that several dimensions are not completely independent and should be combined into a single dimension. our design has 25 or more dimensions, we should look for ways to combine correlated dimensions into a single dimension
is a dimensional modeling mistake to represent elements of a hierarchy as separate dimensions in the fact table.
If
It
www.technologica.com
20
Surrogate Keys
Every
join between dimension and fact tables in the data warehouse should be based on meaningless integer surrogate keys. should avoid using the natural operational production codes. None of the data warehouse keys should be smart, where you can tell something about the row just by looking at the key.
You
www.technologica.com
21
Surrogate Keys
Surrogate keys are like an immunization for the data warehouse
Buffer
Performance advantages The smaller surrogate key translates into smaller fact tables, smaller fact table indices, and more fact table rows per block input-output operation Surrogate
keys are used to record dimension conditions that may not have an operational code
No Promotion in Effect, Date Not Applicable.
www.technologica.com
22
Surrogate Keys
The
date dimension is the one dimension where surrogate keys should be assigned in a meaningful, sequential order keys are needed to support one of the primary techniques for handling changes to dimension table attributes use concatenated or compound keys for dimension tables
Surrogate
Dont
www.technologica.com
23
www.technologica.com
24
www.technologica.com
25
Conformed Dimensions
Most
Conformed
dimensions are either identical or strict mathematical subsets of the most granular, detailed dimension
have consistent dimension keys, consistent attribute column names, consistent attribute definitions, and consistent attribute values conformed dimension may be the same physical table within the database or may be duplicated synchronously in each data mart
They
The
www.technologica.com
26
Conformed Dimensions
Roll-up
dimensions conform to the base-level atomic dimension if they are a strict subset of that atomic dimension.
www.technologica.com
27
Conformed Dimensions
They They
must be published prior to staging of the fact data dimension authority has responsibility for defining, maintaining, and publishing a particular dimension or its subsets to all the data mart clients who need it
The
www.technologica.com
28
Dimensions
Changing Dimensions
Changing Dimensions
Customer age
www.technologica.com
29
it does not maintain any history of prior attribute values any preexisting aggregations based on the department value will need to be rebuilt
www.technologica.com
30
type 2 response is the primary technique for accurately tracking slowly changing dimension attributes. It is extremely powerful because the new dimension row automatically partitions history in the fact table. not suitable for dimension tables that already exceed a million rows
Its
www.technologica.com
31
www.technologica.com
32
type 3 slowly changing dimension technique allows us to see new and historical fact data by either the new or prior attribute values.
www.technologica.com
Hybrid SCD Techniques Series of Type 3 Attributes Predictable Changes with Multiple Version Overlays
33
Report each years sales using the district map for that year.
Report each years sales using a district map from an arbitrary different year. Report an arbitrary span of years sales using a single district map from any chosen year. The most common version of this requirement would be to report the complete span of fact data using the current district map.
www.technologica.com
Hybrid SCD Techniques Type 2 with "Current" Overwrite Unpredictable Changes with Single-Version Overlay
preserves historical accuracy while supporting the ability to report historical data according to the current values
34
www.technologica.com
35
www.technologica.com
36
www.technologica.com
38
Junk Dimensions
What
Leave the flags and indicators unchanged in the fact table row. Make each flag and indicator into its own separate dimension Strip out all the flags and indicators from the design.
www.technologica.com
39
Junk Dimensions
40
Multiple Currencies
www.technologica.com
41
Customer Dimension
Critical The
www.technologica.com
42
www.technologica.com
43
or other life-stage classifications or other lifestyle classifications source market segment (for example, new, active, inactive, closed)
Income Status
Referring
Business-specific Scores
characterizing the customer, such as purchase behavior, payment behavior, product preferences
www.technologica.com
44
attributes are to be used for constraining and labeling; they are not to be used in numeric calculations on those which will be used frequently
Focus
Minimize
Replace
www.technologica.com
45
www.technologica.com
46
Challenges
It generally takes too long to constrain or browse among the relationships in such a big table It is difficult to use previously described techniques for tracking changes in these large dimensions
One solution is to break off frequently analyzed or frequently changing attributes into a separate dimension, referred to as a minidimension
www.technologica.com
Rapidly Changing Customer Dimensions The Mini Dimension with "Current" Overwrite
47
www.technologica.com
48
minidimension terminology refers to when the demographics key is part of the fact table composite key the demographics key is a foreign key in the customer dimension, we refer to it as an outrigger
If
www.technologica.com
Rapidly Changing Customer Dimensions Type 2 with Natural Keys in Fact Table
Customer Dimension - Current Attributes (SCD1) Customer ID (Natural Key) Customer Name Customer Address Customer Date of Birth Customer Date of 1st Order Age Gender Annual Income Number of Children Marital Status Fact Table Customer Key (FK) Customer Demographics Key (FK) More Foreign Keys Facts
49
Customer Dimension - "As was" Attributes (SCD2) Customer Key (PK) Customer ID (Natural Key) Customer Name Customer Address Customer Date of Birth Customer Date of 1st Order Age Gender Annual Income Number of Children Marital Status
www.technologica.com
50
careful to avoid overcounting because we may have multiple rows in the customer dimension for the same individual
The
comparison operators depend on the business rules used to set our effective/expiration dates.
www.technologica.com
51
the keys of the customers or products whose behavior you are tracking
www.technologica.com
52
www.technologica.com
53
Bridge
tables
www.technologica.com
54
www.technologica.com
55
56
www.technologica.com
57
www.technologica.com
58
10: Place text attributes used for constraining and grouping in a fact table 9: Limit verbose descriptive attributes in dimensions to save space 8: Split hierarchies and hierarchy levels into multiple dimensions
Mistake
Mistake
Mistake
Mistake
59
5: Use operational or smart keys to join dimension tables to a fact table 4: Neglect to declare and then comply with the fact tables grain 3: Design the dimensional model based on a specific report
Mistake
Mistake
Mistake
Mistake
60