Sie sind auf Seite 1von 57

DATA WAREHOUSING

Agenda

 Introduction
 Process
 DSS
 Information processing
 Dimensions
 OLAP
 Architecture Types
 BestPractise
 Case
Shilpa Surve

Introduction
Definition

 Data Warehouse is a
• Subject-Oriented
• Integrated
• Time-Variant
• Non-volatile

12/04/08 4
What are Data
Warehouses?
 Data warehouses store large volumes of data
which are frequently used by DSS
 It is maintained separately from the
organization’s operational databases
 Data warehouses are relatively static with only
infrequent updates
 A data warehouse is a stand-alone repository of
information, integrated from several, possibly
heterogeneous operational databases

12/04/08 5
Steps in Building a
Warehouse
 Identify key business drivers, sponsorship,
risks, ROI
 Survey information needs and identify desired
functionality and define functional
requirements for initial subject area.
 Architect long-term, data warehousing
architecture
 Evaluate and Finalize DW tool & technology
 Conduct Proof-of-Concept

12/04/08 6
Steps in building Data
Warehouse
 Design target data base schema
 Build data mapping, extract, transformation,
cleansing and aggregation/summarization
rules
 Build initial data mart, using exact subset of
enterprise data warehousing architecture and
expand to enterprise architecture over
subsequent phases
 Maintain and administer data warehouse

12/04/08 7
The Three Views of Data
Warehousing
 Strategic or Business view
• Define key business drivers of data warehouse
• How can business-driven approach achieve high ROI?
 Architectural or Technology view
• Alternative data warehousing architectures
• How can the right architecture achieve a high ROI?
 Methodology or Implementation view
• Development and implementation methodology
• How can the right methodology achieve a rapid ROI?

12/04/08 8
Swathi Velisetty

Process
DW Components
Metadata Layer
Extraction Data Mart
Cleansing Population
Aggregation
FS1 Summarization
S Transformation
T DM1
FS2 A
G
I DM2
DW
.
Transmission
N
N
G
ODS

.
E
T DMn
W A
O
R OLAP ANALYSIS
.
FSn
R
K E
A

Legacy
System
12/04/08 10
Cleansing process

Process Clean
Metadata data
Raw data Cleansing
(Staging Process Good
Area)
Control
Metadata
Bad
Cleansing
Reports
•Clean the Raw Data
•Mark it Good/Bad
•Generate the cleansing Reports and
mail to the DWA and Feed System
representatives
12/04/08 11
Transformation Process

Process Metadata
•Mapping Detail
•Transformation
Clean Rule
Operationa Transformatio Operational
l n Data
Data Process Store
Control
Metadata

•Transform the cleaned Operational Data into DSS Data


•Load the DSS data into ODS
•ODS contains the current DSS data at the lowest level of
granularity

12/04/08 12
Summarization Process

DW
ODS

Summarizatio
n Weekly Monthly Yearly
Process

Control
Metadata

•Summarize and aggregate ODS data and Populate to the


Warehouse
• Periodicity of Summarization Process depends upon the level of
summarization at Warehouse ( weekly, monthly, daily )

12/04/08 13
Enterprise Data Warehouse

Legacy Select
Metadata
Repositor
Extract y
Clien U
t/ A S
Serve Transform DATA P E
I R
WAREHOUSE
OLTP S
Integrate

Maintain
External
Data
Operational Preparation
Systems
Enterprise wide
12/04/08 14
Distributed Data Marts

Legacy Select Data


Mart
Extract
Clien U
t/ A S
Serve Transform P E
Data R
I
Mart S
OLTP Integrate

Maintain
External Data
Data Mart
Operational Preparation
Systems
Data
12/04/08 15
Multi-tiered Data
Warehouse
Data
Legac
y Select Mart

Clien Extract Metadat U


t/ a A S
Serve
Reposito Data P E
Transform I R
DATA Mart
S
O WAREHOUSE
LTP Integrate

Extern Maintain
Data
al
Mart
Operational
Systems
Enterprise wide
12/04/08 16
Example
Monthly sales Monthly Sales by
by Product for 1991-94
region for 1991-

Weekly sales by
Weekly sales product/sub-
by product
region for for 1991-94

Sales
Detail
for 1991-
Metadata

Sales Detail
for
1985-90

12/04/08 17
Atul zade

Decision support system


What is DSS?
Decision Support Systems (DSS) are interactive computer-
based systems intended to help decision makers utilize
data and models to identify and solve problems and
make decisions.
Data Warehouse is the foundation of DSS process. It is a Strategy
and a Process for Staging Corporate Data.

➭ Enable users to get a “Business View” of the


data
➭ Facilitate Data based Decision Making that
would drive and improve the Business
➭ Discover “Hidden Trends”

12/04/08 19
Driving Forces for DSS

➭ Changes in the Business Environment


Business
Speed

Refor
m
Customer
ON
s
TI
RESULT: I
PET Technology

OM
C
12/04/08 20
How to answer these
Business Queries?
What is the sales distribution
region wise? How did my revenue improve in the past
5 years?

What are the slow


movers Which channel costs
in my product line? me more and pays less?

Which of my Sales
Agents Strategic Planning /
are doing better? Budgeting

What is Defaulter’s Currency Risk,


Profile? Interest Rate Risk,
Liquidity Risk
Who are my profitable
customers?

12/04/08 21
OLTP v/s DSS Environment

 OLTP  DSS Environment


• get information OUT
Environment • small number of diverse
• get data IN queries
• large volumes of simple • periodic updates only
transaction queries • high processing time
• continuous data changes • mode of discovery
• low processing time • subject oriented - summaries
• mode of processing • data consistency
• transaction details • historical data is relevant
• data inconsistency • low concurrent usage
• mostly current data • fewer tables, but more columns
• high concurrent usage per table
• highly normalized data • dynamic applications
structure • facilitates creativity
• static applications
• automates routines
Benefits for Business User

• Flexible Information Access


• High Availability
• Ease of Use
• Quality & Completeness of Data
• Focus on Information Processing
• Information Base for Knowledge Discovery

12/04/08 23
Classification of Business
Users
• Executives/Managers
• Multi-dimensional analysis, reporting tools
• Knowledge Worker
• Ad hoc queries, detail & summary data,
application focus
• Power-Analyst
• Ad hoc queries, Data Analysis & Data Mining
• Customer Contacts
• Detail Data at specific levels
12/04/08 24
Prem Sequera

Information processing
Data Processing to
Information Processing
Business Objectives & Goals
Application Domains and Business Functions
B U SI N E S S E L E M E N T S

Enterprise
T T OLAP
Data Appl.
Operationa Warehouse
Query l
R R
Processin Data KNOWLEDGE
Appl.
g Store Data Mart A Spec.
DISCOVERY
A A Data Mining
(ODS) Analysis
Applications
C Report C
Data Mart B
Appl. KNOWLEDGE
Spec.
Generatio OLAP/ Analysis
MANAGEMENT
E E Query
Tools Data Mart N
Appl.
Spec.
Analysis

D A T A E L E M E N T S Management Decision: Value Chain


Feed Systems and External Sources Data Processing
Heterogeneous Data Sources
Information Processing
Knowledge Processing
12/04/08 26
Subject Oriented Analysis
Process Subject
Oriented Oriented

Entry
Sales Rep Sales
Quantity Sold
Part Number
Date Customers
Customer Name
Product
Description Products
Unit Price
Mail Address

Transactional Storage Data Warehouse Storage

12/04/08 27
Integration of Data
Appl. A - M, F
Encodin Appl. B - 1, 0 M, F
g Appl. C - X, Y

Appl. A - pipeline cm.


Unit of Appl. B - pipeline inches pipeline cm
Attributes Appl. C - pipeline mcf

Integration
Appl. A - balance dec(13,2)
Physical Appl. B - balance PIC 9(9)V99 balance dec(13, 2)
Attributes Appl. C - balance float

Appl. A - bal-on-hand
Naming Appl. B - current_balance balance
ConventionsAppl. C - balance

Appl. A - date (Julian)


Data Appl. B - date (yymmdd) date (Julian)
Consistenc Appl. C - date (absolute)
y
Transactional Storage Data Warehouse Storage

12/04/08 28
Volatility of Data
Volatil Non-
e Volatile

Inse Chan
rt ge

Delet Acces
e s
Inse L
rt oad
Chan
ge Acces
s
Record-by-Record Data Mass Load / Access of
Manipulation Data

Transactional Storage Data Warehouse Storage

12/04/08 29
Time Variant Data Analysis
Current Historical
Data Data

Transactional Storage Data Warehouse Storage

12/04/08 30
Kairav Parikh

Dimension
What is a Dimension?
Data Warehouse is
•Subject-Oriented

•Integrated
•Time-Variant
•Non-volatile
collection of data in support of management’s decision.

Subject Dimension

12/04/08 32
Dimensional Hierarchy
Geography Dimension
World Level World

n
tio
la
Continent America Europe Asia
Re

Level
nt
re
Pa

Country USA Canada Argenti


Level na

State FL GA VA CA WA
Level

City Miami Tamp Orland Dimension


Naple
Level Attributes: a o s Member /
Population, Tourist’s Business Entity
Place
12/04/08 33
Types of Dimensions
• Simple Dimensions (e.g. Time)
• Related Dimensions (e.g. Gender of a Customer)
• Spool Dimensions (e.g. Account as an interaction between
Customer and Product)
• Bucket Dimensions (e.g. Income Ranges of a Customer)
• Slowly Changing Dimensions (e.g. changes in Organization)
• Fast Varying Dimensions (e.g. changes Retail Customers
attributes)
• Unused Dimensions (e.g. Order No., Invoice No.)

12/04/08 34
Dimensional Modeling
STEP 1
•Identify Subjects (Dimensions)
•Identify Hierarchies of a Dimension
•Identify Attributes of levels in Hierarchies
•Define Grain

Countr
y
Industry State
Segment
Industry Type City Fin. Class

Customer

12/04/08 35
Dimensional Modeling
STEP 2
•Use KPIs to identify the Facts
•Group the Facts in a logical set

Financial Non-Financial
Transactions Transactions
Trans. Amount No. of Cheques Cleared
No. of Bonds No. of Visits to a Branch
No. of No. of DEMAT
Transactions Transactions
Service Cost ...
...

12/04/08 36
Dimensional Modeling
STEP 3
•Link the Group of Facts to the Dimensions that participate
in the Facts

Customer Product

Time Financial Organization


Transactions

Channel

12/04/08 37
Dimensional Modeling
STEP 4
•Define Granularity for each Group of Facts

Customer Product
(Customer) (Scheme)

Time Financial Organization


(Day-Hour) Transactions (Branch)

Channel
(Channel)

12/04/08 38
Data Warehouse Schemas
Star Schema
•A Group of Facts connected to Multiple Dimensions

Channel

Time Financial Organization


Transactions

Customer Product

12/04/08 39
Data Warehouse Schemas
Snow-flake Schema (= Extended Star Schema)
•A Group of Facts connected to Dimensions, which are split
across multiple hierarchies and attributes

Time Product

Financial Organization
Channel
Transactions

Customer

Segment Geograp
hy
12/04/08 40
Data Warehouse Schemas
Galaxy Schema
•Multiple Groups of Facts links by few common dimensions

Dimension Dimension

Fact1

Dimension Dimension

Fact2 Dimension Fact3

Dimension Dimension

12/04/08 41
Akshay Shiveshwarkar

OLAP
On-Line Analytical
Processing
 OLAP can be defined as a technology
which allows the users to view the
aggregate data across measurements (like
Maturity Amount, Interest Rate etc.) along
with a set of related parameters called
dimensions (like Product, Organization,
Customer, etc.)

12/04/08 43
What is MDDB?

A multidimensional database is a computer


software system designed to allow for efficient and
convenient storage and retrieval of data that is
• intimately related and
• stored, viewed and analyzed from different perspectives
(Dimensions).

12/04/08 44
RDBMS v/s MDDB

MODEL COLOR SALES VOL.


MINI VAN BLUE 6
MINI VAN RED 5
MINI VAN WHITE 4
SPORTS COUPE BLUE 3
SPORTS COUPE RED 5
SPORTS COUPE WHITE 5
SEDAN BLUE 4
SEDAN RED 3
SEDAN WHITE 2

9 x 3 = 27 cells 3 x 3 = 9 cells

12/04/08 45
Benefits of MDDB over
RDBMS
 Ease of Data Presentation & Navigation
Intuitive, Spreadsheet / Crosstab like data views
 Storage Space
Very low Space Consumption compared to Relational DB
 Performance
Gives much better performance.
Relational DB may give comparable results only through
database tuning (indexing, keys etc), which may not be possible
for ad-hoc queries.
 Ease of Maintenance
No overhead as data is stored in the same way it is viewed. In
Relational DB, indexes, sophisticated joins etc. are used which
require considerable storage and maintenance

12/04/08 46
Issues with MDDB
• Sparsity
– Controlled Sparsity
– Random Sparsity
• Data Explosion
– Due to Sparsity
– Due to Summarization
• Performance
– Doesn’t perform better than RDBMS at high data
volumes (>20-30 GB)

12/04/08 47
OLAP Features

 Subject oriented approach to Decision Support


 Calculations applied across dimensions, through
hierarchies and/or across members
 Trend analysis over sequential time periods,
 What-if scenarios.
 Slicing / Dicing subsets for on-screen viewing
 Rotation to new dimensional comparisons in the
viewing area
 Drill-down/up along the hierarchy
 Reach-through / Drill-through to underlying
detail data
12/04/08 48
Features of OLAP - Drill
Down / Up

Sales at
region/District/Dealership Level

• Moving Up and moving down in a hierarchy is referred to as


“drill-up” / “roll-up” and “drill-down”

12/04/08 49
Ritesh Raushan

Architecture Types
Implementation Techniques -
OLAP Architectures
• MOLAP - Multidimensional OLAP
• Multidimensional Databases for database and application
logic layer
• ROLAP - Relational OLAP
• Access Data stored in relational Data Warehouse for OLAP
Analysis.
• Database and Application logic provided as separate layers
• HOLAP - Hybrid OLAP
• OLAP Server routes queries first to MDDB, then to RDBMS
and result processed on-the-fly in Server
• DOLAP - Desk OLAP
• Personal MDDB Server and application on the desktop

12/04/08 51
MOLAP - MDDB storage

Web
OLAP
Browser
Cube
OLAP
Calculation
Engine OLAP
Tools

OLAP
Applications

12/04/08 52
ROLAP - Standard SQL
storage

MDDB - Relational
Relational Mapping
DW Web
Browser

OLAP
Calculation
SQL Engine OLAP
Tools

OLAP
Applications

12/04/08 53
HOLAP - Combination of
RDBMS and MDDB
OLAP Cube
Any Client

Relational Web
DW Browser

OLAP
Calculation
Engine OLAP
SQL
Tools

OLAP
Applications

12/04/08 54
Architecture Comparison

12/04/08 55
Kiran Naik

Case
Thank you