Beruflich Dokumente
Kultur Dokumente
Data Warehouse
Name of the author: Kanchan Yadav Date Created:23rd Feb 2009
CONTENTS
Why DW Systems? Why Now? Status of DW Systems DW Architecture Operations System vs. DW Systems Data Quality ETL Audit Requirements
E.g. Same person promoted and asked to leave E.g. customer for Savings A/C and same customer for Loan are considered different
Why Now?
Data is being produced ERP provides clean data The computing power is available The computing power is affordable The competitive pressures are strong Commercial products are available
OLTP 2 VSAM
Staging Area
OLTP 3 ERP
ETL
Cube II
11
12
Operational Database
Loans Credit Card Trust Savings
Data Warehouse Architecture
Data Warehouse
Customer Vendor Product Activity
13
Integration
Integration can take place in various dimensions like consistent naming conventions, consistent measurement of variables, consistent encoding structures, consistent physical attributes of data etc. Integration is done at data staging level without changing the operational application systems.
14
Time Orientation
Data warehouse data are snapshot data It has longer time horizon It has a key structure containing an element of time.
15
Non Volatility
Data are loaded into the warehouse and accessed there, but once the snapshot of data is made, the data in the warehouse do not change. Data can be updated according to pre-announced calendar of programme.
16
Metadata
Metadata explains what data exists, where it is located and how to access it. The metadata is a core of a data logistics system, the infrastructure for DW and ultimately the intelligence system.
17
To summarize ...
OLTP Systems are used to run a business
Data Quality 50% BI Projects fail or receive lack of acceptance due to data quality problem Gartner Data Quality problems will cost US business USD 600 Billion per year TDWI Sabanes Oxley (SOX) Act will enforce higher priority to data quality
20
Different Units
22
23
90% of persons were born on November 11, 1911 80% robbery performed in Ghatkopar, Chowky # 1
23
field
unit
25
26
26
Loads
After extracting, scrubbing, cleaning, validating etc. need to load the data into the warehouse Issues huge volumes of data to be loaded small time window available when warehouse can be taken off line (usually nights) when to build index and summary tables allow system administrators to monitor, cancel, resume, change load rates
27
27
When to Refresh?
periodically (e.g., every night, every week) or after significant events on every update: not warranted unless warehouse data require current data (up to the minute stock quotes) refresh policy set by administrator based on user needs and traffic possibly different policies for different sources
28
Extraction Techniques
Full Extract from base tables
read entire source table: too expensive maybe the only choice for legacy systems
29
30