Sie sind auf Seite 1von 7

Slowly Changing Dimensions

Slowly changing dimensions (SCDs) are a common characteristic in many business


intelligence environments. Typically, dimensional hierarchies are presented as independent
of time, e.g., the New York store is part of the Northeast Region. But in reality, many of
these dimensional relationships change over time. For example, a company may annually
reorganize their sales organization or recast their product hierarchy for each retail season.
"Slowly" typically means after several months or even years. Indeed, if dimensional
relationships change more frequently, it may be better to model separate dimensions.

SCDs are well documented in data warehousing literature. Ralph Kimball has been
particularly influential in describing dimensional modeling techniques for SCDs (see The
Data Warehouse Toolkit, for example). Kimball has further coined different distinctions
among ways to handle SCDs in a dimensional model. For example, a Type I SCD presents
only the current view of a dimensional relationship, a Type II SCD preserves the history of a
dimensional relationship, and so forth.

Example

The discussion below is based on an example sales organization that changes slowly in
time as the territories are reorganized, e.g., Sales Reps switch Districts in time.

As-is vs. As-was Analysis

One of the capabilities available with slowly changing dimensions is the ability to perform
either "as-is" analysis or "as was" analysis.

 As-is analysis presents a current view of the slowly changing relationships, i.e.,
displays sales by District according to the way Districts are organized today.
 As-was analysis presents an historical view of the slowly changing relationships,
i.e., displays sales by District according to the way Districts were organized today at
the time the sales transactions occurred.

The techniques described here provide the flexibility to perform either type of analysis.
They also provide an easy way for end users to specify which type of analysis they would
like to perform.

Case 1: Compound key with effective date/end date

One way to physically store a SCD is to employ Effective Date and End Date columns that
capture the period of time during which each element relationship existed. In the example
below, Sales Rep Jones moved from District 37 to District 39 on 1/1/2004 and Kelly moved
from District 38 to 39 on 7/1/2004:

LU_SALES_REP

sales_rep_id sales_rep_name district_id eff_dt end_dt


1 Jones 37 1/1/1900 12/31/2003
2 Smith 37 1/1/1900 12/31/2099
3 Kelly 38 1/1/1900 6/30/2004
4 Madison 38 1/1/1900 12/31/2099
1 Jones 39 1/1/2004 12/31/2099
3 Kelly 39 7/1/2004 12/31/2099
When using this type of dimensional lookup table, the fact table must include a date field,
such as a transaction date:

FACT_TABLE

sales_rep_id trans_dt sales


1 9/1/2003 100
2 9/10/2003 200
3 9/15/2003 150
1 3/1/2004 200
2 3/10/2004 250
3 3/15/2004 300
2 9/5/2004 125
3 9/15/2004 275
4 9/20/2004 150
Specifying the MicroStrategy Schema
Create a logical view to represent only the current District-Sales Rep relationships:

LVW_CURRENT_ORG
select sales_rep_id, district_id
from LU_SALES_REP
where END_DT = '12/31/2099'

Create another logical view that performs the "as-was" join between the lookup table and
fact table, resulting in a fact view at the District level. Note that the resulting view is an "as-
was" or historical view: it captures the Sales Rep-District relationships that existed at the
time the transactions occurred:

LVW_HIST_DISTRICT_SALES
select district_id, trans_dt, sum(sales) sales
from LU_SALES_REP L
join FACT_TABLE F
on (L.sales_rep_id = F.sales_rep_id)
where F.trans_dt between L.EFF_DT and L.END_DT
group by district_id, trans_dt

Create a table alias LU_CURRENT_DISTRICT for LU_DISTRICT.

Define the following Attributes:

@ID = sales_rep_id; @Desc = sales_rep_name


Sales Rep Tables: LU_SALES_REP (lookup), LVW_CURRENT_ORG,
FACT_TABLE
@ID = district_id; @Desc = district_name
Current District Tables: LU_CURRENT_DISTRICT (lookup), LVW_CURRENT_ORG
Child: Sales Rep
@ID = district_id; @Desc = district_name
Historical Tables: LU_DISTRICT (lookup) , LU_SALES_REP,
District LVW_HIST_DISTRICT_SALES
Child: Sales Rep
@ID = date_id, trans_dt
Date Tables: LU_TIME (lookup) , FACT_TABLE,
LVW_HIST_DISTRICT_SALES
@ID = MONTH_ID
Month Tables: LU_TIME (lookup)
Child: Date
Define the Sales fact:
Expr: sales
Sales
Tables: FACT_TABLE, LVW_HIST_DISTRICT_SALES
Define metrics as required:
Sales Sales: SUM(sales)
The result of this is a logical schema that appears as follows:

As-Was Analysis

Users specify as-was analysis by using the Historical District attribute on reports:

Report definition:

Historical District, Month, Sales

Resulting SQL:

select a11.DISTRICT_ID DISTRICT_ID,


max(a13.DISTRICT_NAME) DISTRICT_NAME,
a12.MONTH_ID MONTH_ID,
sum(a11.SALES) WJXBFS1
from (select district_id, trans_dt, sum(sales) sales
from LU_SALES_REP L
join FACT_TABLE F
on (L.sales_rep_id = F.sales_rep_id)
where F.trans_dt between L.EFF_DT and L.END_DT
group by district_id, trans_dt
) a11
join LU_TIME a12
on (a11.TRANS_DT = a12.DATE_ID)
join LU_DISTRICT a13
on (a11.DISTRICT_ID = a13.DISTRICT_ID)
group by a11.DISTRICT_ID,
a12.MONTH_ID

Report results:
As-Is Analysis

Users specify as-is analysis by using the Current District attribute on reports:

Report definition:

Current District, Month, Sales

Resulting SQL:

select a12.DISTRICT_ID DISTRICT_ID,


max(a14.DISTRICT_NAME) DISTRICT_NAME,
a13.MONTH_ID MONTH_ID,
sum(a11.SALES) WJXBFS1
from FACT_TABLE a11
join (select sales_rep_id, district_id
from LU_SALES_REP
where END_DT = '12/31/2099') a12
on (a11.SALES_REP_ID = a12.SALES_REP_ID)
join LU_TIME a13
on (a11.TRANS_DT = a13.DATE_ID)
join LU_DISTRICT a14
on (a12.DISTRICT_ID = a14.DISTRICT_ID)
group by a12.DISTRICT_ID,
a13.MONTH_ID

Report results:

Case 2: New surrogate key for each changing element

A more flexible way to physically store a SCD is to employ surrogate keys and introduce
new rows in the dimension table whenever a dimensional relationship changes. Another
common characteristic is to include an indicator field that identifies the current relationship
records. An example set of records is shown below.

LU_SALES_REP

sales_rep_cd sales_rep_id sales_rep_name district_id current_flag


1 1 Jones 37 0
2 2 Smith 37 1
3 3 Kelly 38 0
4 4 Madison 38 1
5 1 Jones 39 1
6 3 Kelly 39 1
When using this type of dimensional lookup table, the fact table must also include the
surrogate key. A transaction date field may or may not exist.

FACT_TABLE

sales_rep_cd Sales
1 100
2 200
3 150
5 200
2 250
3 300
2 125
6 275
4 150
Specifying the MicroStrategy Schema
Create a logical view to represent only the current District-Sales Rep relationship:

LVW_CURRENT_ORG select sales_rep_id, district_id


from LU_SALES_REP
where current_flag = 1

Create a table alias LU_CURRENT_DISTRICT for LU_DISTRICT.

Define the following Attributes:

Sales Rep @ID = sales_rep_cd


Surrogate Tables: LU_SALES_REP (lookup), FACT_TABLE
@ID = sales_rep_id; @Desc = sales_rep_name
Sales Rep Tables: LU_SALES_REP (lookup), LVW_CURRENT_ORG
Child: Sales Rep Surrogate
@ID = district_id; @Desc = district_name
Tables: LU_CURRENT_DISTRICT (lookup),
Current District
LVW_CURRENT_ORG
Child: Sales Rep
@ID = district_id; @Desc = district_name
Historical District Tables: LU_DISTRICT (lookup), LU_SALES_REP
Child: Sales Rep
@ID = date_id, trans_dt
Date
Tables: LU_TIME (lookup) , FACT_TABLE
@ID = MONTH_ID
Month Tables: LU_TIME (lookup)
Child: Date
Define the Sales fact:
Expr: sales
Sales
Tables: FACT_TABLE, LVW_HIST_DISTRICT_SALES
Define metrics as required:
Sales SUM(sales)
The result of this is a logical schema that appears as follows:

As-Was Analysis

Report definition:

Historical District, Month, Sales

Resulting SQL:

select a12.DISTRICT_ID DISTRICT_ID,


max(a14.DISTRICT_NAME) DISTRICT_NAME,
a13.MONTH_ID MONTH_ID,
sum(a11.SALES) WJXBFS1
from FACT_TABLE a11
join LU_SALES_REP a12
on (a11.SALES_REP_CD = a12.SALES_REP_CD)
join LU_TIME a13
on (a11.TRANS_DT = a13.DATE_ID)
join LU_DISTRICT a14
on (a12.DISTRICT_ID = a14.DISTRICT_ID)
group by a12.DISTRICT_ID,
a13.MONTH_ID

Report results:

As-Is Analysis

Report definition:

Current District, Month, Sales


Resulting SQL:

select a13.DISTRICT_ID DISTRICT_ID,


max(a15.DISTRICT_NAME) DISTRICT_NAME,
a14.MONTH_ID MONTH_ID,
sum(a11.SALES) WJXBFS1
from FACT_TABLE a11
join LU_SALES_REP a12
on (a11.SALES_REP_CD = a12.SALES_REP_CD)
join (select sales_rep_id, district_id
from LU_SALES_REP
where current_flag = 1
) a13
on (a12.SALES_REP_ID = a13.SALES_REP_ID)
join LU_TIME a14
on (a11.TRANS_DT = a14.DATE_ID)
join LU_DISTRICT a15
on (a13.DISTRICT_ID = a15.DISTRICT_ID)
group by a13.DISTRICT_ID,
a14.MONTH_ID

Report results:

Das könnte Ihnen auch gefallen