Sie sind auf Seite 1von 20

DATA WAREHOUSING

Data Warehousing Concepts


 What is a Data Warehouse?
 A Data warehouse is a subject-oriented, integrated, time-varying,
non-volatile collection of data in support of the management's
decision –making process (OR)
Data warehouse is a relational database for Query and Analysis
Rather than for transaction processing.

 Subject-oriented(customer,products,sales,etc)
 Non-volatile
 Time-varying
 Integrated
(William.Inmon –1993)
Subject-oriented
 A Data Warehouse is organized
around major subjects, such as
customer, supplier, product and
sales. Rather than concentrating on
day-to-day operations and
transaction processing of an
organization.
Integrated
 A Data Warehouse is usually
constructed by integrating multiple
heterogeneous sources, such as
relational databases, flat files, and
on-line transaction records. data
cleaning and data integration
techniques are applied to ensure
consistency in naming conventions,
encoding structures, attribute
measures and so on.
Time Variant

 Data are stored to provide


information from a historical
perspective (example: the past 5 -10
years). Every key structure in the
data warehouse contains, either
implicitly or explicitly an element of
time
Non-Volatile
 A data warehouse is always a physically
separate store of data transformed from
the application data found in the
operational environment. due to this
separation, a data warehouse does not
require transaction
processing,recovery,and concurrency
control mechanisms. it usually requires
only two operations in data accessing
:initial loading loading of data and access
of data.
Comparison of OLTP systems and
data warehousing systems
OLTP Systems Data warehousing systems

Holds Current data Holds Historical data


stores detailed,lightly,and highly
Stores detailed data summarized data

Data is dynamic Data is largely static

Transaction-driven Analysis driven

Application -oriented Subject -oriented

Supports day-to-day decisions Supports strategic decisions


Serves large number of clerical/operational Serves relatively low number of managerial
users users
Dimension Modeling
Dimension: Dimension is a structure
which consists of levels, and
hierarchies is defined on each level.
Example:
SEX

MALE FEMALE
Dimension Modeling
Example:
Profession Level 0

Secretary Teacher Level 1


Engineer

Chemical Civil Executive Junior Elementary High School

Level 2
FACTS
 Fact: Fact consists of whole data with
primary key, foreign key relation ship
with dimensions and also consists of
measures.
 There are Three types of facts
1.ADDITIVE FACTS
2.SEMI ADDITIVE FACTS
3.NON ADDITIVE FACTS
Fact less Fact & Conform dim
 Fact less fact is a fact it does not
contain
Measures.
 A Dimension which can share more
than one Fact is called Conform
Dimension
 Collection of Star Schemas and
Snowflake
Schemas is called Galaxy.
Star schema & Snow flake Schema
 Star schema:
A Centralized fact table surrounded with
dimension tables having Primary, Foreign key
relation ship between them is called star schema.
 Snow flake Schema:
A normalized star schema Is called Snow
flake Schema
 Galaxy:
Collection of Star schemas and snow flake
Schema is called Galaxy.
Star Schema
Sex-Dim Date-Dim

Sex key Date key


Sex Current year
Current month
Fact Table Current week

Profession Key
-----------------------------
Sex key Conform Dim
----------------------------
Address key
-----------------------------
Date key
-----------------------------
-
Address Dim
Profession-Dim ---- Address key
---- Country
Profession-key ----- State
Profession-class Measures (Numeric) City
Title
Level
discipline
Snow Flake Schema
Item Dim Fact Table
Time Dim
Item key Time key
Item name Item key Time key
Type Location key Year
Supplier key ------ Quarter
------ Month
-------------------- day
--------
Rupees sold
Units sold

Location Dim City Dim


Supplier Dim
Location key City key
Supplier key City name
Supplier name Street
City key State
Supplier address Country
Supplier type Pin code
TYPES OF MAPPINGS
History
 Simple pass through (None)
 Slowly growing target (Full)
 Slowly changing dimension
(depends)
Types of SCD’S
 Slowly changing dimension—1
 Slowly changing dimension—2

4. Time stamping
5. versioning
6. Flaging
 Slowly changing dimension—3
Slowly changing Dimension-1
 SCD-1:When you does not want
History
use this kind of mapping (Only insert
else
Update takes place) it inserts the
new row or Update the existing
dimensions.
Slowly changing Dimension-2
 SCD -2 (Time stamp): When you want
maintain full history use this kind of mapping.
Inserts new and changed dimensions. Creates an
effective date range to track changes.
 SCD -2 (Versioning): Inserts new and
changed dimensions. Creates a version number
and increments the primary key to track changes.
 SCD -2 (Flaging): Inserts new and changed
dimensions. Flags the current version and
increments the primary key to track changes.
Slowly changing Dimension-3
SCD -3 : when you want partial history use
this kind of mapping. It Inserts new
dimensions. Updates changed values in
existing dimensions. Optionally uses the
load date to track changes.
Data Warehouse Execution
Architecture
 Architecture

Source Systems ETL DATA WAREHOUSE

Reporting( DSS) Servers


DB2/400

ODBC
Oracle NCR
ODBC Tera Data
Informatica (ETL)
ODBC UNIX-HP Warehouse

SQL Server Native Native

FTP

Flat files