Beruflich Dokumente
Kultur Dokumente
Agenda
Introduction
Process
DSS
Information processing
Dimensions
OLAP
Architecture Types
BestPractise
Case
Shilpa Surve
Introduction
Definition
Data Warehouse is a
• Subject-Oriented
• Integrated
• Time-Variant
• Non-volatile
12/04/08 4
What are Data
Warehouses?
Data warehouses store large volumes of data
which are frequently used by DSS
It is maintained separately from the
organization’s operational databases
Data warehouses are relatively static with only
infrequent updates
A data warehouse is a stand-alone repository of
information, integrated from several, possibly
heterogeneous operational databases
12/04/08 5
Steps in Building a
Warehouse
Identify key business drivers, sponsorship,
risks, ROI
Survey information needs and identify desired
functionality and define functional
requirements for initial subject area.
Architect long-term, data warehousing
architecture
Evaluate and Finalize DW tool & technology
Conduct Proof-of-Concept
12/04/08 6
Steps in building Data
Warehouse
Design target data base schema
Build data mapping, extract, transformation,
cleansing and aggregation/summarization
rules
Build initial data mart, using exact subset of
enterprise data warehousing architecture and
expand to enterprise architecture over
subsequent phases
Maintain and administer data warehouse
12/04/08 7
The Three Views of Data
Warehousing
Strategic or Business view
• Define key business drivers of data warehouse
• How can business-driven approach achieve high ROI?
Architectural or Technology view
• Alternative data warehousing architectures
• How can the right architecture achieve a high ROI?
Methodology or Implementation view
• Development and implementation methodology
• How can the right methodology achieve a rapid ROI?
12/04/08 8
Swathi Velisetty
Process
DW Components
Metadata Layer
Extraction Data Mart
Cleansing Population
Aggregation
FS1 Summarization
S Transformation
T DM1
FS2 A
G
I DM2
DW
.
Transmission
N
N
G
ODS
.
E
T DMn
W A
O
R OLAP ANALYSIS
.
FSn
R
K E
A
Legacy
System
12/04/08 10
Cleansing process
Process Clean
Metadata data
Raw data Cleansing
(Staging Process Good
Area)
Control
Metadata
Bad
Cleansing
Reports
•Clean the Raw Data
•Mark it Good/Bad
•Generate the cleansing Reports and
mail to the DWA and Feed System
representatives
12/04/08 11
Transformation Process
Process Metadata
•Mapping Detail
•Transformation
Clean Rule
Operationa Transformatio Operational
l n Data
Data Process Store
Control
Metadata
12/04/08 12
Summarization Process
DW
ODS
Summarizatio
n Weekly Monthly Yearly
Process
Control
Metadata
12/04/08 13
Enterprise Data Warehouse
Legacy Select
Metadata
Repositor
Extract y
Clien U
t/ A S
Serve Transform DATA P E
I R
WAREHOUSE
OLTP S
Integrate
Maintain
External
Data
Operational Preparation
Systems
Enterprise wide
12/04/08 14
Distributed Data Marts
Maintain
External Data
Data Mart
Operational Preparation
Systems
Data
12/04/08 15
Multi-tiered Data
Warehouse
Data
Legac
y Select Mart
Extern Maintain
Data
al
Mart
Operational
Systems
Enterprise wide
12/04/08 16
Example
Monthly sales Monthly Sales by
by Product for 1991-94
region for 1991-
Weekly sales by
Weekly sales product/sub-
by product
region for for 1991-94
Sales
Detail
for 1991-
Metadata
Sales Detail
for
1985-90
12/04/08 17
Atul zade
12/04/08 19
Driving Forces for DSS
Refor
m
Customer
ON
s
TI
RESULT: I
PET Technology
OM
C
12/04/08 20
How to answer these
Business Queries?
What is the sales distribution
region wise? How did my revenue improve in the past
5 years?
Which of my Sales
Agents Strategic Planning /
are doing better? Budgeting
12/04/08 21
OLTP v/s DSS Environment
12/04/08 23
Classification of Business
Users
• Executives/Managers
• Multi-dimensional analysis, reporting tools
• Knowledge Worker
• Ad hoc queries, detail & summary data,
application focus
• Power-Analyst
• Ad hoc queries, Data Analysis & Data Mining
• Customer Contacts
• Detail Data at specific levels
12/04/08 24
Prem Sequera
Information processing
Data Processing to
Information Processing
Business Objectives & Goals
Application Domains and Business Functions
B U SI N E S S E L E M E N T S
Enterprise
T T OLAP
Data Appl.
Operationa Warehouse
Query l
R R
Processin Data KNOWLEDGE
Appl.
g Store Data Mart A Spec.
DISCOVERY
A A Data Mining
(ODS) Analysis
Applications
C Report C
Data Mart B
Appl. KNOWLEDGE
Spec.
Generatio OLAP/ Analysis
MANAGEMENT
E E Query
Tools Data Mart N
Appl.
Spec.
Analysis
Entry
Sales Rep Sales
Quantity Sold
Part Number
Date Customers
Customer Name
Product
Description Products
Unit Price
Mail Address
12/04/08 27
Integration of Data
Appl. A - M, F
Encodin Appl. B - 1, 0 M, F
g Appl. C - X, Y
Integration
Appl. A - balance dec(13,2)
Physical Appl. B - balance PIC 9(9)V99 balance dec(13, 2)
Attributes Appl. C - balance float
Appl. A - bal-on-hand
Naming Appl. B - current_balance balance
ConventionsAppl. C - balance
12/04/08 28
Volatility of Data
Volatil Non-
e Volatile
Inse Chan
rt ge
Delet Acces
e s
Inse L
rt oad
Chan
ge Acces
s
Record-by-Record Data Mass Load / Access of
Manipulation Data
12/04/08 29
Time Variant Data Analysis
Current Historical
Data Data
12/04/08 30
Kairav Parikh
Dimension
What is a Dimension?
Data Warehouse is
•Subject-Oriented
•
•Integrated
•Time-Variant
•Non-volatile
collection of data in support of management’s decision.
Subject Dimension
12/04/08 32
Dimensional Hierarchy
Geography Dimension
World Level World
n
tio
la
Continent America Europe Asia
Re
Level
nt
re
Pa
State FL GA VA CA WA
Level
12/04/08 34
Dimensional Modeling
STEP 1
•Identify Subjects (Dimensions)
•Identify Hierarchies of a Dimension
•Identify Attributes of levels in Hierarchies
•Define Grain
Countr
y
Industry State
Segment
Industry Type City Fin. Class
Customer
12/04/08 35
Dimensional Modeling
STEP 2
•Use KPIs to identify the Facts
•Group the Facts in a logical set
Financial Non-Financial
Transactions Transactions
Trans. Amount No. of Cheques Cleared
No. of Bonds No. of Visits to a Branch
No. of No. of DEMAT
Transactions Transactions
Service Cost ...
...
12/04/08 36
Dimensional Modeling
STEP 3
•Link the Group of Facts to the Dimensions that participate
in the Facts
Customer Product
Channel
12/04/08 37
Dimensional Modeling
STEP 4
•Define Granularity for each Group of Facts
Customer Product
(Customer) (Scheme)
Channel
(Channel)
12/04/08 38
Data Warehouse Schemas
Star Schema
•A Group of Facts connected to Multiple Dimensions
Channel
Customer Product
12/04/08 39
Data Warehouse Schemas
Snow-flake Schema (= Extended Star Schema)
•A Group of Facts connected to Dimensions, which are split
across multiple hierarchies and attributes
Time Product
Financial Organization
Channel
Transactions
Customer
Segment Geograp
hy
12/04/08 40
Data Warehouse Schemas
Galaxy Schema
•Multiple Groups of Facts links by few common dimensions
Dimension Dimension
Fact1
Dimension Dimension
Dimension Dimension
12/04/08 41
Akshay Shiveshwarkar
OLAP
On-Line Analytical
Processing
OLAP can be defined as a technology
which allows the users to view the
aggregate data across measurements (like
Maturity Amount, Interest Rate etc.) along
with a set of related parameters called
dimensions (like Product, Organization,
Customer, etc.)
12/04/08 43
What is MDDB?
12/04/08 44
RDBMS v/s MDDB
9 x 3 = 27 cells 3 x 3 = 9 cells
12/04/08 45
Benefits of MDDB over
RDBMS
Ease of Data Presentation & Navigation
Intuitive, Spreadsheet / Crosstab like data views
Storage Space
Very low Space Consumption compared to Relational DB
Performance
Gives much better performance.
Relational DB may give comparable results only through
database tuning (indexing, keys etc), which may not be possible
for ad-hoc queries.
Ease of Maintenance
No overhead as data is stored in the same way it is viewed. In
Relational DB, indexes, sophisticated joins etc. are used which
require considerable storage and maintenance
12/04/08 46
Issues with MDDB
• Sparsity
– Controlled Sparsity
– Random Sparsity
• Data Explosion
– Due to Sparsity
– Due to Summarization
• Performance
– Doesn’t perform better than RDBMS at high data
volumes (>20-30 GB)
12/04/08 47
OLAP Features
Sales at
region/District/Dealership Level
12/04/08 49
Ritesh Raushan
Architecture Types
Implementation Techniques -
OLAP Architectures
• MOLAP - Multidimensional OLAP
• Multidimensional Databases for database and application
logic layer
• ROLAP - Relational OLAP
• Access Data stored in relational Data Warehouse for OLAP
Analysis.
• Database and Application logic provided as separate layers
• HOLAP - Hybrid OLAP
• OLAP Server routes queries first to MDDB, then to RDBMS
and result processed on-the-fly in Server
• DOLAP - Desk OLAP
• Personal MDDB Server and application on the desktop
12/04/08 51
MOLAP - MDDB storage
Web
OLAP
Browser
Cube
OLAP
Calculation
Engine OLAP
Tools
OLAP
Applications
12/04/08 52
ROLAP - Standard SQL
storage
MDDB - Relational
Relational Mapping
DW Web
Browser
OLAP
Calculation
SQL Engine OLAP
Tools
OLAP
Applications
12/04/08 53
HOLAP - Combination of
RDBMS and MDDB
OLAP Cube
Any Client
Relational Web
DW Browser
OLAP
Calculation
Engine OLAP
SQL
Tools
OLAP
Applications
12/04/08 54
Architecture Comparison
12/04/08 55
Kiran Naik
Case
Thank you