Beruflich Dokumente
Kultur Dokumente
WAREHOUSING
Basics
Concepts
People Making Technology Work
Agenda
Evolution of DWH
Why should we consider Data Warehousing solutions ?
Definition of Data Warehouse
Characteristics of DWH
Difference between DWs and OLTP
DWH Life Cycle
DWH Architecture
Dimensional Data Modeling
Star Schema Design
Fact Table
Fact Granularity
Dimension Tables
Snowflake Schema Design
Important aspects of Star Schema & Snow Flake Schema
Data Acquisition (ETL)
ETL Concepts
Evolution of DWH
Characteristics of DWH
Subject Oriented
Non Volatile
Integrated
Time Variant
Differences..
DWH database (OLAP)
OLTP database
OLTP Database
Multidimensional Database
Structures
Normalized Data
Structures
Index - Many
Index - Few
Joins - Few
Joins - Many
Data Modification
More
Business Analyst
Data Modular
ETL Developer
Report Developer
Testing
DWH Architecture
Three common architectures are:
DWH Architecture (Basic)
DWH Architecture (With a staging area)
DWH Architecture (With a staging area and data marts)
DWH Architecture
(with a staging area and data marts)
Conceptual modeling
Logical Modeling
Physical Modeling
Fact Table
Contain numeric measures of the business
Contains facts and connected to dimensions
two types of columns
facts or measures
foreign keys to dimension tables
May contain date-stamped data
A fact table might contain either detail level facts or facts
that have been aggregated
In the example, sales fact table is connected to dimensions location, product, time
and organization. Measure "Sales Dollar" in sales fact table can be added
across all dimensions independently or in a combined manner which is
explained below.
Sales Dollar value for a particular product
Sales Dollar value for a product in a location
Sales Dollar value for a product in a year within a location
Sales Dollar value for a product in a year within a location sold or serviced by
an employee
Fact Granularity
A fact table maintains a numerical info
It is defined as the level at which fact info/- is stored.
The level is determined by dimensional table.
Year?
Quarter?
Month?
Week?
Day?
Dimension Tables
Location Dimension
Location Dimension
Location Dimension
Id
Country
Name
State
Name
County
Name
City Name
USA
New York
Shelby
Manhattan
1/1/2005 11:23:31
AM
USA
Florida
Jefferson
Panama
City
1/1/2005 11:23:31
AM
USA
California
Montgomery
San Hose
1/1/2005 11:23:31
AM
USA
New Jersey
Hudson
Jersey City
1/1/2005 11:23:31
AM
Data Acquisition
It is the process of extracting the relevant
business info/- from the different source
systems transforming the data from one
format into an another format, integrating
the data in to homogeneous format and
loading the data in to a warehouse
database.
Data Extraction
(E)
Data Transformation (T)
Data Loading
(L)
ETL Process
The ETL Process having the following basic steps
Is mapping the data between source systems and target database
Is cleansing of source data in staging area
Is transforming cleansed source data and then loading into the target
system
Source System
A database, application, file, or other storage facility from
which the data in a data warehouse is derived.
Mapping
The definition of the relationship and data flow between
source and target objects.
Staging Area
A place where data is processed before entering the
warehouse.
Cleansing
The process of resolving inconsistencies and fixing the
anomalies in source data, typically as part of the ETL
process.
Transformation
The process of manipulating data. Any manipulation beyond
copying is a transformation. Examples include cleansing,
aggregating, and integrating data from multiple sources.
Transportation
The process of moving copied or transformed data from a
source to a data warehouse.
Target System
A database, application, file, or other storage facility to which the
"transformed source data" is loaded in a data warehouse.