Sie sind auf Seite 1von 5

DW Chap 5: Data Modeling

In this chapter, we will design the data stores for the AE case study. We will use the NDS+DDS architecture.

The game plan is to first look at:


a. Begin by looking at the business requirements and then design the dimensional data stores DDS accordingly. b. We will define the meaning of the Facts and dimensional attributes. c. We will also define data hierarchy. of the columns in the fact and dimension tables. There are instances that one column in the DDS is populated from several tables in the source system (for example, for column 1 of table A and column 1 of table B) and sometimes from more than one source system.

d. Then we will map the data in the DDS with the source systems; that is, we will define the source

e. We will define the transformations (formula, calculation logic, or lookups) required to populate the target
columns.

f. Then we will design the normalized data store (NDS) by normalizing the dimensional data store and by
examining the data from the source systems we have mapped. The normalization rules and the first, second, and third normal forms are described in the appendix. We will use these rules to design the NDS.

Designing the Dimensional Data Store The users will be using the DW to do analysis in six business areas: a. Product Sales b. Subscription Sales c. Subscriber Profitability

d. Supplier Performance e. CRM campaign segmentation f. CRM campaign results

So what we need to do is, analyze each business area one by one to model the business process in order to create the data model. The first business area that we will look at is:

1. Product Sales: an order-item data mart in the retail industry is a classic example of data
warehousing.

2. A product sales event happens when a customer is buying a product, rather than subscribing to a
package. The roles (who, what, where) in this event are the customer, a product, and a store. The levels (or in dimensional modeling terms, the measures) are quantity, unit price, value, direct unit cost, indirect unit cost. We get these levels from the business requirements in chapter 4. In this case these are what users need in order to be able to perform their task.

3. We put the measures in the fact tables and the roles in the dimension tables.
4. The business event now becomes the Fact Table row. Quantity, unit price, and unit cost measures are derived from the source system, but the other three measures (sales value, sales cost, and margin) are calculated. They are defined as follows:

Sales value = unit price X quantity Sales cost = unit cost X quantity Margin = Sales value Sales cost

5. The four keys in the Product Sales fact table link the fact table with the four dimensions. According
to Ralph Kimball, it is important to declare the grain of the fact table. Grain is the smallest unit of occurrence of the business event in which the event is measured. In other words, grain is completing this sentence: One row in the fact table corresponds to.. in this case, the grain is each item sold one row in the Product Sales fact corresponds to each item sold.

6. The general rule for dealing with complex events and exceptions is to always look at the source system. We have to replicate or mimic source system logic. This is because the output of the DW must agree with the source system. 7. It is important to get the data model and business logic with the source system. It is important to get the data model and business logic correct to ensure that the output of the DW reflects the correct business conditions.

8. Customer profitability is where you calculate the profit you are making for each customer for a
certain period.

9. Realize that order ID and line number are called degenerate dimensions. A degenerate dimension
is a dimension with only one attribute, and therefore the attribute is put in the fact table. Order ID and line number are identifiers of the order line in the source system.

10. Also it is always a good idea to put a timestamp column in the fact table that is the time the record
is loaded in the fact table. This column allows us to determine when the fact row was last modified. These two columns Order ID and line number are in addition to the transactional data/timestamp column.

11. The transaction time stamp explains when the order (or order line) was created, shipped, canceled,
or returned, but it does not explain when the record was loaded into the DW and when it was last modified. 12. The next step in the fact table design is to determine which column combination uniquely identifies a fact table row.

This is important because it is required for both logical and Physical Database Design in order to determine the primary key(s)

The Concept of a Data Mart

A collection of a fact table and its dimension tables is called a Data Mart. Remember; this concept of a data mart is applicable only if the DW is in dimensional model. When we are talking about normalized data stores there is no data marts. A Data Mart is a group of related fact tables and their corresponding dimension tables containing the measurements of business events, categorized by their dimensions. Data Marts exist in dimensional data stores.

1. We then define the data type for each column. 2. All key columns have integer data types because they are surrogate keys, that is, simple integer values with one increment. 3. The three timestamp columns have datetime data types. The source system code is an integer because it contains only the code, and the description is stored in the source system table in the metadata db.

Dimension Tables
4. Now that we have discussed the fact table, lets discuss the dimension tables. A dimension table is
a table that contains various attributes explaining the dimension key in the fact table. As explained earlier, the fact table stores business events. The attributes explain the conditions of the entity at the same time the business event happened. 5. You realize that the customer dimension table is linked to the fact table using the customer_key column. The customer_key column is a primary key in the customer dimension table, and it is a foreign key on the fact table. This is known in the database world as referential integrity.

6. Referential Integrity is the process of establishing a parent-child relationship between two tables,
with the purpose of ensuring that every row in the child table has a corresponding parent entry in the parent table. 7. The customer dimension contains columns that describe the condition of the customer who made the purchase, including the data about customer name, address, telephone number, date of birth, e-mail, gender, interest, occupation, and so on.

Source System Mapping


In this section we shall discuss why and how to do source system mapping. Source system mapping is an exercise in mapping the dimensional data store to the source systems.

1. Now that we have completed the DDS design, the next step is to map every column in the DDS to
the source systems so that we know where to get the data from when populating those columns.

2. When doing this, we need to determine the transformations or calculations required to get the
source columns into the target.

3. This is necessary to understand the functionality that the ETL logic must perform when populating
each column on the DDS tables. Bear in mind that a DDS column may come from more than one table in the source system or even from more than one source system, because the ODS integrates data from multiple source systems. This is where the source_system_code column becomes useful because it enables us to understand from which system the data is coming. There will be an upcoming example using the Product Sales data mart. Key Points in doing Source System Mapping a. The first step is to find out from which tables the columns (that are changing) are coming from.

b. When we designed the DDS, we created the DDS columns (fact table measures and
dimensional attributes) to fulfill the business requirements. Now we need to find out where we can get the data from by looking at the source systems tables.

c. We write the source tables and their abbreviations in brackets so we can use them later in
mapping table. d. Then we write the join conditions between these source tables so that we know how to write the query later when we develop the ETL. The following list specifies the target table, the source table, and the join conditions. It shows where the Product Sales fact table is populated from and how the source tables are joined.

a. Target Table in DDS : Product Sales Fact Table b. Source: WebTower9 sales_order_header[who], WebTower9 sales_order_detail [wod], Jade
order_header[joh], Jade order_detail[jod], Jupiter item_master [jim], Jupiter currency_rate table [jcr]

c. Join condition/criteria: woh.order_id = wod.order_id, joh.order_id = jod.order_id, wod.product_code


= jim.product_code, jod.product_code = jim.product_code. In this case study, the inventory is managed in Jupiter, and the sales transaction is stored in the two frontoffice systems, WebTower9 and Jade, which is why we have a link between the Jupiter inventory master table and the order detail tables in WebTower9 and Jade.

There the key thing out of this whole section is being able to know where every single column will be sourced from, including the transformation.

Designing the Normalized Data Store


Now that we have designed the dimensional data store and mapped every column in every table to the source systems, we are ready to design the normalized data store, which is a normalized database that sits in between the Stage and the DDS. The NDS is a master data store containing the complete data sets, including all historical transaction data and all historical versions of master data. The NDS contains master tables and transaction tables. The transaction table is a table that contains a business transaction or business event. A master table is a table that contains the persons or objects involved in the business event. In this section we will learn how to design the data model for the NDS.

a. First we list all the entities based on the source tables and based on the fact and dimension
attributes in the DDS. b. We list all the tables in the source system that we have identified during the source system mapping exercise in the previous section. c. We then normalize the DDS fact and dimension tables into a list of separated normalized tables.

d. We then arrange the entities according to their relationships to enable us to establish the referential
integrity between the entities. We do this by connecting the parent table to the child table. A child table has a column containing the primary key of the parent table. The DDS fact tables become child (transaction) tables in NDS. The DDS dimension table becomes the parent (master) tables in the DDS. e.

Das könnte Ihnen auch gefallen