Sie sind auf Seite 1von 3

Kimball University: Three ETL Compromises to Avoid

Why neglecting slowly changing dimensions, failing to capture metadata and overlooking scope
creep can be the undoing of a dimensional data warehousing initiative.
Whether you are developing a new dimensional data warehouse or replacing an existing environment, the
ETL (extract, transform, load) implementation effort is inevitably on the critical path. Difficult data sources,
unclear requirements, data quality problems, changing scope, and other unforeseen problems often
conspire to put the squeeze on the ETL development team. It simply may not be possible to fully deliver
on the project team's original commitments; compromises will need to be made. In the end, these
compromises, if not carefully considered, may create long-term headaches.

Bob
Becker

In my last article on "Six Key Decisions for ETL Architectures," I described the decisions ETL teams face when implementing a
dimensional data warehouse. This article focuses on three common ETL development compromises that cause most of the longterm problems around dimensional data warehouses. Avoiding these compromises will not only improve the effectiveness of your
ETL implementation, but will also increase the likelihood of overall DW/BI success.
Compromise 1: Neglecting slowly changing dimension requirements
Kimball Group has written extensively on slowly changing dimension (SCD) strategies and complementary implementation
alternatives. It's important that the ETL team embrace SCDs as an important strategy early in the initial implementation process. A
common compromise is to put off to the future the effort required to properly support SCDs, especially Type 2 SCDs where
dimension changes are tracked by adding new rows to the dimension table. The result is often a total rework disaster.
Deferring the implementation of proper SCD strategies does save ETL development time in the immediate phase. But as a result,
the implementation embraces only Type 1 SCDs, where all history in the data warehouse is associated with current dimension
values. Initially, this seems to be a reasonable compromise. However, it's almost always more difficult to "do it right" when you have
to circle back in a later phase. The unfortunate realities are that:

Following a successful initial implementation, the team faces pressure to roll out new capabilities and additional phases
without time to revisit prior deliverables and add the required change-tracking capabilities. Thus, the rework ultimately required to
support SCD requirements continues to expand.

Once the ETL team finally has the bandwidth to address SCD, the ugly truth becomes apparent. Adding SCD Type 2
capabilities into the historical data requires rebuilding every dimension that contains Type 2 attributes; each dimension will have to
have its primary key rekeyed to reflect the new historically appropriate Type 2 rows. Rebuilding and rekeying even one core
conformed dimension will unavoidably require reloading all impacted fact tables due to the new dimension key structures.

Facing a possible rebuild of much of the data warehouse environment, many organizations will back away from the effort.
Rather than reworking the existing historical data to restate the dimension and fact tables in their correct historical context, they
implement the proper SCD strategies from a point-in-time forward. By compromising the implementation of proper SCD techniques
in the initial development process, the organization has lost possibly years of important historic context.

Why neglecting slowly changing dimensions, failing to capture metadata and overlooking scope creep can be the undoing
of a dimensional data warehousing initiative.
Compromise 2: Failing to Embrace a Metadata Strategy
DW/BI environments spin off copious amounts of metadata. There is business metadata, process metadata, and technical
infrastructure metadata that all needs to be vetted, captured and made available. The ETL processes alone generate significant
amounts of metadata.
Unfortunately, many ETL implementation teams do not embrace metadata early in the development process, putting off its capture to
a future phase. This compromise typically is made because the ETL team does not "own" the overall metadata strategy. In fact, in
the early stages of many new implementation efforts, it's not uncommon for there to be no designated owner of the metadata
strategy.
Lack of ownership and leadership makes it easy to defer dealing with metadata, but that's a short-sighted mistake. Much of the
critical business metadata is identified and captured, often in spreadsheet form, during the dimensional-modeling and source-totarget mapping phases. What's more, most organizations use ETL tools to develop their environment, and these tools have
capabilities to capture the most pertinent business metadata. Thus, the ETL development phase presents an opportune moment -often squandered -- to capture richly described metadata. Instead, the ETL development team only captures the information required
for their development purposes, leaving valuable descriptive information on the cutting room floor. Ultimately, in a later phase, much
of this effort ends up being redone in order to capture the required information.
At a minimum, the ETL team should strive to capture the business metadata created during the data-modeling and source-to-target
mapping processes. Most organizations find it valuable to focus initially on capturing, integrating, flowing, and, ultimately, surfacing
the business metadata through their BI tool; other metadata can be integrated over time.
Compromise 3: Not Delivering a Meaningful Scope
The ETL team is often under the gun to deliver results under tight time constraints. Compromises must be made. Reducing the
scope of the initial project can be an acceptable compromise. If, for example, a large number of schemas was included in the initial
scope, one time-honored solution is to break that effort up into several phases. It's a reasonable, considered compromise assuming
the DW/BI project team and sponsors are all fully, if not grudgingly, on board.
But it's a problem when the ETL team makes scope compromises without proactively communicating with the DW/BI project team
and sponsors. Clearly, this is a recipe for failure and an unacceptable compromise.
This situation is often a symptom of deeper organizational challenges. It can start innocently enough, with shortcuts taken under
pressure in the heat of the moment. In retrospect, however, these compromises would never have been made in the full light of day.
In an effort to achieve overly ambitious deadlines, the ETL team might fail to handle data quality errors uncovered during the
development process, fail to properly support late arriving data, neglect to fully test all ETL processes, or perform only cursory
quality assurance checks on loaded data. These compromises lead to inconsistent reporting, an inability to tie into existing

environments, and erroneous data and often lead to a total loss of confidence among business sponsors and users. The outcome
can be total project chaos and failure.
Make Compromises Openly and Honestly
Compromises may be necessary. The most common concession is to scale back an overly ambitious project scope; but key
stakeholders need to be included in this decision. Other, less intrusive changes can be considered, such as reducing the number of
years of back history used to seed a new environment, reducing the number of dimension attributes or number of metrics required in
the initial phase (while being careful about SCD Type 2 requirements), or reducing the number of source systems integrated in the
initial phase. Just keep everyone informed and on the same page. The key is to compromise in areas that do not put the long-term
viability of the project at risk.

Das könnte Ihnen auch gefallen