You are on page 1of 40

Data warehousing concepts

1
Agenda
OLTP Vs OLAP
Modeling Techniques
User Profile
Top down approach
Bottom up approach

2
Traditional OLTP systems
OLTP systems are highly structured sets of information that
support the ongoing and day-to-day operation of an
organization

These databases usually hold information about small


subsets of the organization split on the basis of
Business functions e.g. sales, purchase,travel
Geographical locations e.g. Northern region,
Eastern region
Logical units e.g. REUD, BCMD, IHLD, EISA

3
OLTP (Contd)

Transactional database require a highly


normalized database design to achieve
performance goals and to optimize on
storage space
These databases need to record, on a
real-time basis, every transaction that
the organization enters into

4
What is OLAP ?

An organizations success also depends


on its ability to analyze data (through
views and reports) and make intelligent
decisions that potentially affect its
future. Systems that facilitate such
analyses are called On Line
Analytical Processing (OLAP) systems

5
Why not OLTP for OLAP?

OLTP databases do not contain historical


data
OLTP databases contain small subsets of
organizational data
OLTP databases are heterogeneous in
nature and geographically distributed
systems

6
In other words...

OLTP systems are


Fragmented
Not integrated.
Difficult to access.
Disparate sources.
Disparate platforms.
Poor data quality.
Redundant data.
Difficult to understand.

7
Data warehouse
A Data Warehouse is a copy of the
enterprise operational data, suitably
modified to support the needs of
analytical processes and stored
outside the operational database.
According to Bill Inmon, known as the
father of Data Warehousing, a data
warehouse is a subject oriented,
integrated, time-variant, nonvolatile
collection of data in support of
management decisions.

8
OLAP Vs OLTP
Data warehouse OLTP database
database Designed for real-time
Designed for analysis of business operations
business measures by Optimized for a common
categories and attributes set of transactions,
Optimized for bulk loads usually adding or
and large, complex, retrieving a single row at
unpredictable queries a time per table
that access many rows Optimized for validation
per table of incoming data during
Loaded with consistent, transactions; uses
valid data; requires no validation data tables
real time validation Supports thousands of
Supports few concurrent concurrent users
users relative to OLTP

9
Data warehouse architecture

Data Warehouse OLAP Servers Clients


Server (Tier 2) (Tier 3)
(Tier 1)
e.g., MOLAP
Semistructured Analysis
Sources
Data
Warehouse serve

extract Query/Reporting
transform
load serve
refresh
etc. e.g., ROLAP
Operational
DBs Data Mining
serve

Data Marts
10
D/W Architecture Goals

Deliver a great user experience user


acceptance is the measure of success
Function without interfering with OLTP
systems
Provide a central repository of
consistent data
Answer complex queries quickly
Provide a variety of powerful analytical
tools, such as OLAP and data mining

11
Characteristic of D/W
Are based on a dimensional model
Contain historical data
Include both detailed and summarized data
Consolidate disparate data from multiple sources while
retaining consistency
Focus on a single subject, such as sales, inventory, or
finance

12
User Profile
Statisticians (2%)
Knowledge workers (15%)
Information Consumers (83%)

13
Steps in implementing D/W

Identify and gather requirements


Design the dimensional model
Develop the architecture, including the
Operational Data Store (ODS)
Design the relational database and
OLAP cubes
Develop the data maintenance
applications
Develop analysis applications
Test and deploy the system

14
Identify and gather requirements
Identify the Sponsor
Meet the Business Users
Meet Data experts
Communicate with users often and thoroughly

15
Identify The Business Areas
For Telecom D/W
Customer Behavior
Corporate Customer
Customer Service
Accounts
Settlements
Partner
Supplier
Competitor
Marketing

16
Sources and Targets
Sources
Telephone call detail recording
Customer Service such as ordering service
and disconnecting lines
Customer payment processing
Targets
Studies of minutes of call use by customer
group
Segmentation of customers by minutes of
call use
Product bundling analysis
Customer Payment analysis

17
Design the dimensional model
Identify the dimensions
Should match with Business needs
Identify the grain of the detail
Decide on
Star Schema
Snow-flake Schema
Star-flake Schema

18
Star Schema

19
Star Schema

20
Snowflake Schema

21
Snowflake Schema

22
23
Design consideration of
Dimension Table

Level of hierarchies
Surrogate Key
Star or Snowflake
Date and Time

24
Slowly changing Dimension
Type 1: Overwrite the dimension record.
Type 2: Add a new dimension record.
Type 3: Create new fields in the dimension record.

Tracking bands can reduce the updation to some extent


Nightmare if source and report not in sync

25
Rapidly changing Dimensions

Breaking
offending
dimension
attributes
Fact less facts!
Confirmed
Dimensions

26
Fact tables
Multiple Fact tables
Additive measures
Non-additive/Semi additive measures
Calculated Measures
Granularity

27
ETL

Extract, Transform and Load process


may be described as the process of
selecting, migrating, transforming,
cleansing and converting mapped data
from the legacy environment to data
warehouse environment.

28
Extraction

Push strategy
Pull strategy

29
Transformation
Transformation involves applying
complex filters, removing the
inconsistency between data from
different sources, conditional
transforms, complex calculations to
create derived data etc. Cleansing of
data could be an important part of the
transformation process

30
Loading

Loading involves the insertion of data


into the target system, that is, the data
warehouse. Loading is the last step
before the users see the data. It
involves populating the fact and
dimension tables as well as aggregation
tables that are part of the physical data
model

31
Loading approach

Transform and Load


Load and Transform
Transform while Loading

32
Issues in Loading

Volume and frequency of loading


Disk space
Scheduling

33
Data Marts
A data mart is a repository of data
gathered from operational data and other
sources that is designed to serve a
particular community of knowledge
workers. In scope, the data may derive
from an enterprise-wide database or data
warehouse or be more specialized. The
emphasis of a data mart is on meeting
the specific demands of a particular group
of knowledge users in terms of analysis,
content, presentation, and ease-of-use

34
OLAP
ROLAP
MOLAP
HOLAP

35
Few Popular tools

ETL
DataStage
Data Junction.
Microsoft DTS (Available with SQL Server
7.0 and above)
Oracle Warehouse Builder.
Informatica- PowerCenter
IBM- Data Warehouse Manager
AbIntio

36
Few Popular tools
OLAP
Cognos
Business Objects
Power Analyzer
Microsoft Analysis service
Micro strategy
DB2 OLAP Server
Hyperion OLAP Server

37
Few Popular tools
Data Mining
Intelligent Miner
DARWIN
SAS

38
References
http://192.168.121.14/asp/Search/DispDoc.asp?DocNo
=8703&KCURating=8.61&ContentType=Internal+Literat
ure
http://www.datawarehouse-training.com
http://www.datawarehousing.com
http://www.caworld.com/proceedings/2000/data_wareh
ousing/ws006pn/sld001.htm
http://sdgcomputing.com
http://www.dmreview.com

39
Thank You

40