Sie sind auf Seite 1von 30

Training Material

Data Warehouse
Name of the author: Kanchan Yadav Date Created:23rd Feb 2009

CONTENTS
Why DW Systems? Why Now? Status of DW Systems DW Architecture Operations System vs. DW Systems Data Quality ETL Audit Requirements

Data Warehouse Architecture

What is a Data Warehouse?


A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. [Barry Devlin]

Data Warehouse Architecture

E.g. Same person promoted and asked to leave E.g. customer for Savings A/C and same customer for Loan are considered different

Data Warehousing -It is a process


Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible A decision support database maintained separately from the organizations operational database

Data Warehouse Architecture

Why Now?
Data is being produced ERP provides clean data The computing power is available The computing power is affordable The competitive pressures are strong Commercial products are available

Data Warehouse Architecture

Status of Data Warehousing Systems


Just begun in India One of the hottest & fast growing area, in India & abroad Many IT organizations have established separate group for domestic & export work Data Warehousing already being carried out in various Indian organizations, including Stock exchanges, Banks, Telcom, Insurance cos, etc. Data Mining not far behind
Data Warehouse Architecture 7

Data Warehouse Architecture


Query / Reporting Tool OLTP 1 RDBMS
Cube I

OLTP 2 VSAM

Staging Area

Data Warehouse/ Data Mart

OLTP 3 ERP

ETL

Cube II

OLAP Tool Slicing /Dicing

Data Warehouse Architecture

What are Operational Systems?


They are mainly OLTP systems Run mission critical applications Need to work with stringent performance requirements for routine tasks Used to run a business!

Data Warehouse Architecture

Data Environment in a business Organization


Current scenario
Data are captured through different applications Applications may be on different platforms, developed at different times with specified objectives Getting data out of applications is a difficult task

In other words Data are said to be in jail


Data Warehouse Architecture 10

Operational Data vs. DW Data


Application oriented Subject Oriented Detailed Summarized Accurate as on Snapshot moment Updated frequently No update

Data Warehouse Architecture

11

Data in Data Warehouse


Used to manage and control business Data is historical or point-in-time Optimized for inquiry rather than update Use of the system is loosely defined and can be adhoc Used by managers and end-users to understand the business and make judgements

Data Warehouse Architecture

12

Application-Orientation vs. SubjectOrientation


Application-Orientation Subject-Orientation

Operational Database
Loans Credit Card Trust Savings
Data Warehouse Architecture

Data Warehouse
Customer Vendor Product Activity
13

Integration

Integration can take place in various dimensions like consistent naming conventions, consistent measurement of variables, consistent encoding structures, consistent physical attributes of data etc. Integration is done at data staging level without changing the operational application systems.

Data Warehouse Architecture

14

Time Orientation

Data warehouse data are snapshot data It has longer time horizon It has a key structure containing an element of time.

Data Warehouse Architecture

15

Non Volatility

Data are loaded into the warehouse and accessed there, but once the snapshot of data is made, the data in the warehouse do not change. Data can be updated according to pre-announced calendar of programme.

Data Warehouse Architecture

16

Metadata

Metadata explains what data exists, where it is located and how to access it. The metadata is a core of a data logistics system, the infrastructure for DW and ultimately the intelligence system.

Data Warehouse Architecture

17

To summarize ...
OLTP Systems are used to run a business

The Data Warehouse helps to optimize the business


Data Warehouse Architecture 18

Loading the Warehouse

Cleaning the data before it is loaded

Data Quality 50% BI Projects fail or receive lack of acceptance due to data quality problem Gartner Data Quality problems will cost US business USD 600 Billion per year TDWI Sabanes Oxley (SOX) Act will enforce higher priority to data quality

Data Warehouse Architecture

20

Data Quality - The Reality


Tempting to think creating a data warehouse is simply extracting operational data and entering into a data warehouse Nothing could be farther from the truth Warehouse data comes from disparate questionable sources Data Profiling before and during ETL Batch Totals Handling Missing Values / Outliers / Duplicates / Non-quality Data
Data Warehouse Architecture 21

Data Integration Across Sources


Savings Loans Trust Credit card

Same data different name

Different data Same name

Different Type, Length

Different Units

Data Warehouse Architecture

22

Data Transformation Terms


Transformation The conversion of data types from the source to the target data store (warehouse) -- always a relational database

Data Warehouse Architecture

23

90% of persons were born on November 11, 1911 80% robbery performed in Ghatkopar, Chowky # 1

23

Data Transformation Example


Data Warehouse
encoding
appl appl appl appl appl appl appl appl appl appl appl appl A - m,f B - 1,0 C - x,y D - male, female A - pipeline - cm B - pipeline - in C - pipeline - feet D - pipeline - yds A - balance B - bal C - currbal D - balcurr
Data Warehouse Architecture 24

field

unit

Data Integrity Problems


Same person, different spellings Agarwal, Agrawal, Aggarwal etc... Multiple ways to denote company name Persistent Systems, PSPL, Persistent Pvt. LTD. Use of different names mumbai, bombay Different account numbers generated by different applications for the same customer

Data Warehouse Architecture

25

Data Integrity Problems (Cont)


Required fields left blank Invalid product codes collected at point of sale manual entry leads to mistakes in case of a problem use 9999999

Data Warehouse Architecture

26

26

Loads
After extracting, scrubbing, cleaning, validating etc. need to load the data into the warehouse Issues huge volumes of data to be loaded small time window available when warehouse can be taken off line (usually nights) when to build index and summary tables allow system administrators to monitor, cancel, resume, change load rates

Data Warehouse Architecture

27

27

When to Refresh?
periodically (e.g., every night, every week) or after significant events on every update: not warranted unless warehouse data require current data (up to the minute stock quotes) refresh policy set by administrator based on user needs and traffic possibly different policies for different sources

Data Warehouse Architecture

28

Extraction Techniques
Full Extract from base tables
read entire source table: too expensive maybe the only choice for legacy systems

Data Warehouse Architecture

29

How To Detect Changes


Create a snapshot log table to record ids of updated rows of source data and timestamp Detect changes by:
Defining after row triggers to update snapshot log when source table changes Using regular transaction log to detect changes to source data

Data Warehouse Architecture

30

Das könnte Ihnen auch gefallen