Sie sind auf Seite 1von 10

Sample Deliverable

Velocity
Visio Templates for Slowly Changing
Dimension Type 2 Mappings
Document Author:

Document Owner:

Date Created:

Last Updated:

Project:

Company:
Introduction
Data warehouses store historical data in various forms. One of the most common components
is a set of Slowly Changing Dimension (SCD) tables. These contain historical information of
objects to allow analysis of past situations. PowerCenter mappings that load these tables often
follow a common structure; hence they lend themselves for generation using a Mapping
Architect for Visio template. This document describes two different SCD Type 2 designs and
provides corresponding template files for each while keeping maximum reusability of
PowerCenter logic in mind. The templates can be used as examples or as a basis to customize
to specific needs. The variety of techniques used allow for several variations of mappings to be
generated using a single template.
We start with a short description of the basic SCD Type 2 process that needs to be generated,
followed by the first template and an explanation of various rules. We continue with additional
functionality, including the (more complex) template. Refer to the Mapping Architect for Visio
Guide for more information about specific options.

Basic SCD Type 2 Process


SCD Type 2 tables are recognized by date columns that contain the validity period of each
record. When a change in a record is detected, the current record is marked as no longer valid
and a new record is added to the table. Next to this basic principle, a variety of different
technical fields can be added, depending on customer requirements. Data is stored in the
following tables:
Table Name Description
S Contains source data to be processed
H Contains historic information (the actual SCD Type 2 table)

The example in this sample deliverable uses the following technical columns:
Column Name Datatype Description S H

VLD_FM_DT Datetime Start datetime of record validity period X


VLD_TO_DT Datetime End datetime of record validity period X
INL_PPN_TMS Datetime Datetime when record was created X X
LAST_PPN_TMS Datetime Datetime when record was last updated X
LAST_VSN_FL Number(1) Boolean indicator if record is the last known version (1) X
or not (0)
SUR_KEY Integer Surrogate key of the record X
UNQ_ID_SRC_S String(50) Unique identifier of the object (i.e., concatenation of the X X
TM logical key columns into a single string value)
HASH_VALUE String(32) MD5 checksum of the record content X X
CHANGE_IND String(1) Indicator for type of change that has been detected

2|
The overall process flow looks like this:

Calculate
Source S-table
key and
Data
checksum

Perform
Comparison

Apply Changes

H-table

The source data is loaded into the S-table. In this step the UNQ_ID_SRC_STM and
HASH_VALUE columns are calculated and stored in the table. The MD5 checksums of the
records in the S-table and the last known version of each object in the H-table are then
compared. This is done using a full outer join on the UNQ_ID_SRC_STM column and an
expression transformation. The CHANGE_IND column is calculated and can have the following
values:
Value Description
I New record detected in source data (key does not yet exist in H-table)
U Updated record detected in source data (checksum differs from last known
version)
D Record no longer occurs in source data (key no longer exists in S-table)
R Previously deleted record reappeared in source data
X No change detected

The physical inserts and updates that are performed in the target table depend on the type of
change that is detected:
Value H-table

I Insert new record


Insert new record and update current
U
record
Update VLD_TO_DT of current
D record
Insert new record and update current
R record

X No action

Informatica Velocity – Sample Deliverable | 3


PowerCenter Mapping Example
The PowerCenter mapping that executes the process logic described previously looks like this.

The mapping implements the following processing sequence:


1. The new data is selected from the Source table
2. The current records are selected from the History table
3. Both datasets are joined using a full outer join on the UNQ_ID_SRC_STM column
4. The reference date (used in the calculation of the validity period) is set. This can be done in
various ways (for instance using the system date or a mapping parameter). The use of the
latter allows running ‘in the past’.
5. The checksums of the resulting dataset are compared to detect the type of change (if any)
6. The current system date is retrieved, so it can be stored in the INL_PPN_TMS and
LAST_PPN_TMS columns
7. The records are separated into physical inserts and updates (depending on the type of
change)
8. Surrogate key values for new records are generated using a sequence generator
transformation
9. The records are written to the target table

The layout of the mapping is independent of the structure of the data that is being processed.
The transformations are always identical while only the table/column names and data types will
differ. This makes it an ideal candidate for mapping generation using Mapping Architect for
Visio. Also, most of the transformations are identical for every mapping so that they can be
defined as reusable transformations to simplify future maintenance. It is even possible to use
shortcuts to reusable transformations. However, for simplicity, this has not been done in the
example.

4|
Mapping Template
Visio Template
The Mapping Architect for Visio template to generate mappings looks like this:

As can be seen above, the structure is identical to the PowerCenter mapping. Only the
components that are different in each mapping are parameterized in the template.

Rule Types
The links between the transformations contain rules that can have different types. This example
template uses the types described below.
Type Description Where Used (examples)
1. Named Propagate a port using fixed from and to port names. Everywhere
2. Pattern Propagate ports using regular expressions to select Between source qualifiers
from port names and calculate to port names. In this and joiner, between router
example these rules are used to add/remove and target definitions.
prefixes and suffixes.

Template Parameters
The template uses a single parameter to generate the PowerCenter mappings.
Name Description Where Used
1. $TABLE_CODE$ Identification of the source/target Source/target definitions,
definitions to be used in the mapping. This Source Qualifiers,
is the part between ‘TB_’ and ‘_S/_H’ in Sequence generator
the table names.

The values of template parameter(s) need to be provided when a mapping is generated. This
can be done in the PowerCenter Designer using the Import Mapping Template wizard or using
parameter files. The latter are XML-files that can be generated (using for instance PowerCenter)

Informatica Velocity – Sample Deliverable | 5


and make it possible to quickly generate a large number of mappings. An example parameter
file is included in the next section.

Example Files
The following files can be used to generate the example mapping. To be complete, the mapping
itself is also included for reference.
File Description
1. Source/Target definitions to be used for mapping generation

Tabledefs.XML

2. Visio template file

H_DT_Basic.vsd

3. Published Visio template file

H_DT_Basic.xml

4. Visio parameter file to generate example mapping

H_DT_Basic_param_
DATASET.xml
5. Example PowerCenter mapping

TB_DATASET_H_DT_
Basic.XML

Extended SCD Type 2 Process


Sometimes more complicated designs are used for SCD Type 2 processes. This second
example adds an additional target table that will only contain the details of the changes that
were applied. Because the number of changes is normally much smaller than the full dataset
the DT-table only contains a small number of records. These can be used as an efficient source
for other processes because it removes the need to query the (potentially very large) H-table
itself for all versions that were created/closed during the last run.
The DT table also contains the surrogate key of the previous version in case a record is
updated. This makes it possible to detect what actually has changed in the record using a fast
lookup on the surrogate key column.
Table Name Description
DT Contains the changes that were applied and can be used to quickly
select these for further processing.

The overall process flow now looks like this:

6|
Calculate
Source S-table
key and
Data
checksum

Perform
Comparison

Apply Changes

DT-table H-table

Physically, records will only be inserted into the new DT target table. However, the content of
the records depends on the type of change (indicated by the CHANGE_IND value):
Value DT-table

I Values of all columns of new record


Values of all columns of new record,
U surrogate key value of previous
version
Primary key values of the ‘deleted’
D record
Values of all columns of new record,
R surrogate key value of previous
version
X No action

PowerCenter Mapping Example


The PowerCenter mapping that executes the process logic described above looks like this:

Informatica Velocity – Sample Deliverable | 7


The main differences when compared to the previous example are:
 Two additional target instances for the new DT-table
o One to load the Inserts, Updates and Reinserts into
o One to load the ‘Deleted’ records into. This has to be a separate target because for
this type of change the primary key values of the H-table has to be connected to the
target, while the other changes use the values from the S-table
 A transaction control transformation is added to make sure that the commits to the H and DT
tables are done at the same moment. This is essential to keep the data in both tables
consistent in case of session failure.
 Changes not directly visible are:
o An additional group DELETE_DT has been added to the Router to propagate the
‘Deleted’ records to the new DT target. This has to be a separate group because for
this type of change the ports that are connected to the target are different (see
above).
o The primary key columns of the record are now also selected from the H-table, while
the previous example only extracted the technical columns that are required for the
join and comparison.
o The UNQ_ID_SRC_STM column of the H-table is now also propagated to the router
and DT_Delete target instance.

The overall layout of the mapping is still independent from the structure of the data that is being
processed, so it remains an ideal candidate for mapping generation using Mapping Architect for
Visio.

8|
Mapping Template
Visio Template
The Mapping Architect for Visio template to generate mappings looks like this:

As can be seen the structure is identical to the PowerCenter mapping. Only the components
that are different in each mapping are parameterized in the template.

Rule Types
The links between the transformations contain rules that can have different types. This example
template uses one additional type of rule when compared to the basic example:
Type Description Where Used (examples)
Parameter Use a template parameter to specify the actual rules Between H-source
to be applied during mapping generation. In this qualifier and joiner,
example these rules are used to propagate a between router and DT_D
variable number of named ports. target definition.

Template Parameters
The template uses parameters to generate the PowerCenter mappings. The additional
parameters used by this example are described in the table below.
Name Description Where Used
1. $PK_COLUMNS_IN Contains names of primary key columns
Rule_SQ_H_Joiner
$ that need to be selected from the H-table.
2. $PK_COLUMNS_O Contains names of primary key columns
UT$ that need to be loaded to the DT-table for Rule_Delete_DT
deleted records.

Informatica Velocity – Sample Deliverable | 9


These rules are used to correctly propagate the primary key columns from the source to the
target table. These columns are stored in the DT-table when a deleted record is detected. The
H-table does not have a primary key defined except the surrogate key column; therefore the
columns need to be selected explicitly. An example of the value of these parameters is:
Named:PK_COLUMN_1 (TO) h_PK_COLUMN_1;Named:PK_COLUMN_2 (TO) h_PK_COLUMN_2;
This example causes port PK_COLUMN_1 to be propagated to h_PK_COLUMN_1, while
PK_COLUMN_2 is propagated to h_PK_COLUMN_2. Next to named ports the parameter rule
can also be used to specify other types of link rules, including different types in a single
parameter. This makes it possible to create templates that are very flexible in the ports that are
being propagated.

Example Files
The following files can be used to generate the example mapping based on the table definitions
of the basic design. The mapping itself is also included for reference.
File Description
1. Visio template file

H_DT_Extended.vsd

2. Published Visio template file

H_DT_Extended.xml

3. Visio parameter file to generate example mapping

H_DT_Extended_par
am_DATASET.xml

4. Example PowerCenter mapping

TB_DATASET_H_DT_
Extended.XML

10 |

Das könnte Ihnen auch gefallen