Beruflich Dokumente
Kultur Dokumente
Agenda
Review of various Data Warehouse models in conjunction with their place in the modern data warehousing methods. The Data Vault, as a preferred flavor of the Enterprise Data Warehouse for different businesses. Overview of the Data Vault concepts and objects. Real-world example where the Data Vault was chosen to replace a more traditional architecture for EDW
"facts": numeric transaction data "dimensions: reference information that gives meaning to the facts.
Conclusion on DM
All of these characteristics boil down to a main usage of the DM in data marts and access/presentation layers. The DW can be created using this approach for small data volumes and stable business structures.
10
11
12
13
14
15
Conclusion on DV Model
ANALYTIC DATA FEED BACK TO SOURCE SYSTEMS
EDW
PRE-AGGREGATES
3NF:TENDENCY
16
A Bit of Chemistry
Atom = Clear Definitions of the Data -- Usually 3NF Water Molecule = 2-1/2normalized DV: Hubs/Links/Sats Sugar Molecule = Tables/Views with Pre-aggregated Data Sugar Cube = Rapid BI Product -- Usually DM
17
change over time, any descriptive data about a business key (HUB key).
18
Hubs
Identifiable business element. Very low chance of changing (generally, not editable in
source systems). Same semantic meaning and granularity across the enterprise.
Hubs Examples
Key: Nissan-ABC/123-456 Line of Business: NAICS 2007 45A Organization: Empire State College Model Number: 33777185JN
19
Hubs Quiz
A HUB represents an Event or Transaction (True or False) HUB may contain record source as part of business key
(True or False) HUB always has an end-date (True or False) HUB business key can be comprised of multiple columns (True or False) HUB can be dependent on another HUB (True or False)
20
Links
Intersection of two or more Business Keys (Hubs) A Unit of Work (e.g. Product by Supplier Link, Customer
by Category Link) Identifiable business element relationships Business event Transaction between business keys (Hubs) Hierarchy Same As (data cleansing) Includes Hubs Keys as Foreign key
Links Examples:
Invoice Header (Buyer, Seller, Invoice Date, Receive Date) Orders (Employee, Shipper, Customer, Order Date)
21
Links Quiz
A transaction is always represented by a Link (True or
False) A Link can contain business keys (True or False) A unit of work is always represented by a Link (True or False) A link must contain a unit of work (True or False)
22
Satellite
Time dimensional table about Hub or Link Has one migrated foreign key (either from Hub or Link)
Date
Satellite Notes
Non-identifying business elements Descriptive of Business Key from Hub or Link
Satellites Quiz
Can Satellite be dependent on 1 or more parent tables
Satellite can export its Key (True or False) Satellite can be snow flaked (True or False) Satellite is not impacted by Delta Processing (True or
False)
24
NON-Core DV Structures
A PIT (Point-in-Time) is a specialized SATELLITE derivative
that is used to get the latest row AS OF a specific date WITHOUT use of nested sub-queries in the main satellite query. A MEASURE SATELLITE is a specific SATELLITE dedicated to hold particular descriptive data on which calculations or aggregations can be performed for analytical purpose. A REFERENCE is a specific hybrid (flat table instead of Hub/Sat) in which decoding info is truly static, usually de-normalized, with no history. A BRIDGE similar to PIT designed for performance but created from many Hubs and Links, allows computing by columns.
25
Few Lines
In 2008 W.H. (Bill) Inmon stated that the Data Vault is the optimal approach for modeling the EDW in the DW2.0 framework. (DW2.0). The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. (http://www.tdan.com/view-articles/5054/). The number of Data Vault users surpassed 500 in 2010 and grows rapidly (http://danlinstedt.com/about/dvcustomers/) .
26
The Story
27
28
DS Feeds: Daily, weekly, monthly and ad-hoc from RDBMSs and flat files, some UD EDW Platform: SQL Server 2005+. Projected size of the EDW for 2010 is 45TB, growing 10-15% annually Data Warehouse Builder: WhereScape RED 6 BI: Balanced Insight Consensus/MicroStrategy 9 The Phase I of the Data Vault EDW is completed (approx. 500 objects) along with the Data Mart and BI reports(6 weeks). The subsequent phases are being developing now Also, the re-platforming of the Data Vault to Teradata 13 is underway now
Conclusion
Every Data Warehousing Flavor is applicable depending on phase and purpose of the DW:
Third Normal Form Normalization Rules Data Vault Structure Golden Copy Tables/Views with pre-aggregated data Reusable Components Dimensional Model Interpretation of Data by Users
Specifically, Data Vault Model is, at current time, an optimum approach for Enterprise Data Warehouse building.
29
Questions?
raphael_ws learndatavault.com