Sie sind auf Seite 1von 6

ETL Design Questionnaire

Matthias Urech
ETL Developer

January 28, 2005

About the author


Matthias Urech is an ETL developer and owner of interface-development.com, a website devoted
to provide solutions to the interface developer community since 2003. He writes primarily on data
integration topics and issues with specific information for Informatica PowerCenter developers.
Matthias has a master in business information management and his expertise is shaped by years
of experience and practical application of Informatica PowerCenter and databases.

Abstract
This article provides a questionnaire that can be useful when being involved in the design
process of an interface.

Introduction
Do not design a bridge by counting the number of people who swim across the river today. That's
also true for ETL projects. Depending on your role in the ETL project, your work starts sometimes
when the data flow needs to be built. Fine, someone will tell you what you have to do. But
sometimes you will be involved earlier in the project to design the interface. As a matter of fact,
you are faced then with the challenge of gathering requirements. That's where the ETL Design
Questionnaire comes in place. Asking the right questions is not only essential, but will put you
also in a position of controlling the design process. As a side effect, you will be recognized as a
professional ETL developer that has a plan.

Sounds like the questionnaire is an exciting and useful tool to work with. But how is this different
from other methodologies or frameworks (i.e. Informatica Velocity)? Each of the provided
questions is supported by tables or graphical elements (let's call them diagrams). It is neither
about creating comprehensive documentation nor strictly answering all questions. Consider this
set of questions rather as a presentation of multiple views in order to understand the interface.
The main goal is to gather as much information with a simple approach. The key word here is
"tailoring". Decide yourself what you want to use!

Like everybody has a schema for getting rich that will not work, using the proposed questions in
this article will neither prevent you for going through the design process nor is the list of questions
complete. At the end of the day, your goal should be to provide the requirements for developing a
working interface, respond to changes, and have a good costumer communication and
collaboration. I hope nonetheless that the provided ETL Design Questionnaire will be useful to
you and to your challenges.
ETL Design Questionnaire
In this section we will go through the following set of questions:

ƒ Question 1: Who is involved?


ƒ Question 2: Which event triggers the interface?
ƒ Question 3: How is the target layout?
ƒ Question 4: How is the logical mapping?
ƒ Question 5: Where are data quality issues addressed?
ƒ Question 6: How is the data life cycle?
ƒ Question 7: How is the interface data flow?
ƒ Question 8: What are the operational tasks?
ƒ Question 9: What level of documentation should be provided?

Question 1: Who is involved?


Basically, the ETL team is responsible for extracting, transforming and loading data. More
specifically, there are more tasks to think about within and outside the ETL team. In detail:

ƒ Performing data analysis


ƒ Defining data quality strategy
ƒ Gather business rules
ƒ Develop interface
ƒ Establish test plans
ƒ Perform reconciliation
ƒ Execute tests and provide sign off
ƒ Implement system into production
ƒ Communication and enable change management
ƒ Documentation of interface and support cases

These are just some of the tasks and the list is by far not complete. However, all those tasks have
to be done by someone. The objective of the role/task diagram is to define the involved people
and their responsibilities. In short: you can simple ask "who does what?". For example: subject
matter expert (who) provides business rules (what).

Figure 1: Role/Task Diagram


Question 2: Which event triggers the interface?
Consider the interface as a black box for the moment. First, we want to understand the overall
context before building the interface. In detail, you want to know what causes an event that
provides input for the interface and what is the output. For example: Time and expense data has
been posted after the month end (event) will read daily charged hours for each employee
(input) with the interface and deliver aggregated hours for the financial period (output).

Figure 2: Context Diagram


Question 3: How is the target layout?
All that matters is the result of the solution resp. the target. Therefore, the earlier you know what
you have to provide the earlier you can begin with the development.

Table 1: Target Definition Table


Question 4: How is the logical mapping?
We presume that source and target are known. The logical mapping table helps you defining the
linking of source and target fields and to document business rules. The logical mapping is like
water. It's easier to build something on it if linking and business rules are frozen.

Table 2: Logical Mapping Table


Question 5: Where are data quality issues addressed?
Here, it is about defining if you should care about data quality. You should address as many data
quality issues to the source as possible since future interface development initiatives would
otherwise have to deal with it again. However, some issues like incomplete data might be best
addressed in the interface.

Table 3: Data Quality Assignment Table


Question 6: How is the data life cycle?
This is the most important question of all. We are not only discussing the data flow during the life
cycle but also about the relationship between the systems. Let's have a quick look at different
types of relationships before continue explaining the data life cycle diagram. Figure 3: illustrates
the three types of system relationships:

Figure 3: System Relationships


ƒ Master / Slave
This is the most common relationship. Data will be maintained in system 1 and provided
to system 2.
ƒ Master / Master (one direction)
In this relationship, data will be maintained in both systems. Only system 1 will be able to
update data in system 2. Therefore, additional efforts (either manual or automatic) have
to be done to prevent data inconsistency and loss of data quality.
ƒ Master / Master (both directions)
As already mentioned in the previous relationship, data will be maintained in both
systems. This relationship shows that both systems are able to update data in each
other.

By knowing the types of relationships, you are now able to draw the data flow in the data life cycle
diagram. For example: the data flow arrow will point from system 1 to system 2 in case of a
Master/Slave relationship. What's left is to move the data flow arrow horizontally to define at
which point in time an action (i.e. create) in system 1 will cause a certain action in system 2.

Please note that the given actions in system 1 are just examples. Some systems only allow
flagging data inactive instead of deletion. And the road still doesn't end here since some systems
are connected to more than one. Therefore, you could also draw additional systems to the
diagram. In such situations it would be worth spending some thoughts about prioritizing the data
flow order and if the data food chain makes sense at all.

Figure 4: Data Life Cycle Diagram


Question 7: How is the interface data flow?
The interface data flow diagram is mostly used to outline the extract, transform and load (ETL)
process. The goal is to have a common understanding about the data flow and the involved
applications and actions to deliver data between the systems.

Figure 5: Interface Data Flow Diagram


Question 8: What are the operational tasks?
Some operational tasks are overseen during development. As a result, you have to put your
hands again on the interface. Thinking about operational steps from the beginning will help you
identify hidden requirements and perform accurate effort estimates.

Table 4: Operational Task Table


Question 9: What level of documentation should be provided?
Everything is built and your job is done. Everything? Right, documentation should also be
provided. The understanding about the scope of documentation is often different. Table 5
supports you in defining the documentation scope.

Table 5: Documentation Decision Table

Das könnte Ihnen auch gefallen