Sie sind auf Seite 1von 9

DATA WAREHOUSING

WH. INMAN
RALPH KIMBAL are the fathers of the Data Warehousing.
Def1: A Copy of Transactional data specially structured for reporting and analysis.
Def2: A Data Warehouse is Subject Oriented, Integrated, Non-volatile and
Time variant data in the support of effective decision making of the Enterprise.

Subject Oriented

Data warehouses are designed to help you analyze data. For example, to learn more about
your company's sales data, you can build a warehouse that concentrates on sales. Using
this warehouse, you can answer questions like "Who was our best customer for this item
last year?" This ability to define a data warehouse by subject matter, sales in this case,
makes the data warehouse subject oriented.

Integrated

Integration is closely related to subject orientation. Data warehouses must put data from
disparate sources into a consistent format. They must resolve such problems as naming
conflicts and inconsistencies among units of measure. When they achieve this, they are
said to be integrated.

Nonvolatile

Nonvolatile means that, once entered into the warehouse, data should not change. This is
logical because the purpose of a warehouse is to enable you to analyze what has occurred.

Time Variant

In order to discover trends in business, analysts need large amounts of data. This is very
much in contrast to online transaction processing (OLTP) systems, where performance
requirements demand that historical data be moved to an archive. A data warehouse's
focus on change over time is what is meant by the term time variant.

OLTP SYETEMS VERSUS OLAP SYSTEMS

OLTP SYSTEMS OLAP SYSTEMS

OLTP stands for Online Transaction OLAP stands for Online Analytical
Processing Processing

These Systems are transaction oriented These Systems are used for Reporting &

1
Analysis

Clerical users are the Users of the Business Users, Management Information
OLTP systems Systems are the Users of OLAP systems

INSERT, UPDATE, DELETE SELECT operation used for Reporting and


Operations performed by the OLTP Analysis
Systems

These Systems Contains Current Data These Systems Contains Current + Historical
Data
Current Data period is for 1 year Historical Data Period for 5 to 30 years.

Data is Volatile Nature Data is Non-Volatile Nature.

Response time is very faster. Availability of Data is Important

OLTP systems contains Detail data These Contains Summarized&Aggrigated data

Data is Normalized Data Data is Denormalized Data


Data Stored in the form Relational Data is stored in the form of Multidimensional
Data Bases Data Bases Called Cubes.

The data in the case of OLTP systems the data in the case of OLAP systems can be
Can be classified in to Master Data, classified in to Fact tables & Dimension table.
And Transactional Data.
Granularity:
Granularity specifies at what level of information with in the Data
Warehouse has to be represented.
When the data is stored at Summarized level drilling the data is not
possible. If the data is stored at the Detail level it leads to duplicate data (redundancy
occurred).
Granularity can be decided by three major factors
i) Business Users Requirements.
ii) Storage capability of the Data Warehouse.
iii) Existing data in the OLTP systems.

Aging Process (Backup & Recovery Process):

As the time passes Current Data becomes Historical Data and Historical
Data becomes Most Historical Data. To accommodate incoming data from the OLTP
systems transfer the most historical data from the online storage to Offline storage. This
process is called Aging Process OR Backup and Recovery Process.

2
Architecture of the Data Warehouse:

Architecture of the Data Warehouse specifies the collection of


components interrelated to be that defines the flow of data from the OLTP system to
Business Users. Were House architecture Contains the following Components.

1. OLTP systems (Source Systems)


2. Staging Layer
3. Enterprise Data Warehouse
4. Data Marts
5. Operational Data Store (ODS)
6. Reports
7. Business Users

OLTP systems:
OLTP stands for online transaction processing . These systems are the source systems.
These systems called as Legacy systems. The main priorities of the source systems are
uptime and availability. These systems contain little historical data and reporting is
burden to these systems. Source systems contains master tables and transaction tables.
These are produce entity relationship tables only data drilling is not possible in OLTP
systems. GUI also not supported by OLTP systems. By using different source systems
inserting the data in to staging layer.

3
Staging Layer (Data Staging Area):
A staging layer is a storage area and set of processes that clean, transform, combine, de-
duplicate, household, archive and prepare source data for use in the source data for use in
the data warehouse. The staging layer receiving the data from OLTP systems and
transforms that data from inconsistent mode to consistent mode and reduces the
redundant data. After all the things consistent data send to Enterprise data warehouse.

Enterprise Data Warehouse:


The enterprise data warehouse is the queryable source of data in the enterprise. The data
warehouse is nothing but the combination of all data marts. Enterprise data warehouse
maintain total organizational data (current and total history). Enterprise data warehouse is
taken the data from the staging layer.

Data Marts:
A Data Mart is a subset of the Data Warehouse. Using to satisfy the
requirements of a specific functional department. Data Marts are two types
They are
1. Independent Data Marts
2. Dependent Data Marts
Data marts Constructed from the OLTP systems directly these are called Independent
Data Marts.
Data Marts Constructed from the Data Warehouse is called Dependent Data Marts.

Independent Data Marts are constructed as the prototypes to project the benefits
of the Data Warehouse to the Business Users.
Characteristics of the Data Marts:
The data in the Data Marts is organized according to a specific functional
department. The data is denormalized nature and it contains the Summarized data.
Multidimensional modeling is used to model the structure of the Data Marts.

Modeling the Data Warehouse:

Modeling the Data Warehouse is the process of converting the functional


specifications of the business users into technical specifications of the data base.

Modeling the Data Marts:

Multidimensional modeling is used to model the structures of the Data


Marts. It is based on the concept of Star Schema (or) Star Join.
Star Schema contains two components.
1. Dimension Tables
2. Fact Tables
A Star Schema contains multiple Dimension tables and a single Fact table
Dimension tables contains master data and Fact tables contains transactional data.

4
The information present in the fact table can be reported and analyzed with respect to the
data in multidimensional tables.
ODS (operational data store):
It is a hybrid component existing in OLAP containing OLTP characteristics. It is
basically used to satisfy the current data reporting requirements of the organization at the
detailed level. ODS contains volatile, detailed and current data.
Reports:
After loading the data into data warehouse and data marts by using different types of
OLAP and reporting tools we are generating reporting and analysis. These reports are in
the form of relational and Multidimensional depends on the user requirement.
Business Users:
The business users should have the self capability of creating their own queries to
generate the reports. They are not no technical details and the platform of the data base.
Once a report is produced the business users can have the option of analyzing the data in
the multidimensional format.

Components of Fact Tables:


Fact tables contains two types of Key sections
1. Key Section
2. Measure Section

All measures generated give to the business transactions should be recorded in the
measure section of the fact table.
All the master data used to implement business transactions should be recorded in a
separate type of tables called Dimension tables.
Among the tables of the Star schema Dimension tables are independent and Fact table is
dependent on all dimension tables.
To have the relation between dimension tables and fact tables the primary keys of the
dimension tables should be migrated as the foreign keys to the key section of the fact
table.
Since schema looking like the star it is treated as a star schema.
Measures:
Columns generated automatically when a business transaction occurs. OLAP
supports three types of measures.
1. additive measures
2. non additive measures
3. semi additive measures
1. Measures that can participate in the arithmetic calculations to derive new set of
measures are called Additive measure.
2. Measures that will never participate in the arithmetic calculations are treated as
non additive measures.
3. Semi additive measures that will participate in the arithmetic calculations
depending on the context.

Types of Star schemas:


OLAP supports two types of Star schemas

5
1. Star flake star schema
2. Snow flake star schema
In the star flake star schema dimension tables are denormalized and fact tables are
normalized.
In the snow flake star schema dimension tables are normalized and fact tables are also
normalized.
Non keys are depends on primary key with in the table that kind of tables are called
Normalized tables
Non keys depend on non keys with in table that kind of table is called denormalized
tables.

Types of Extractions:
Migration of data from OLTP systems to OLAP systems can be classified
into following types
1. Initial Extract
2. Delta Extract
3. Incremental Extract
4. Refresh Strategy

Data extracted from OLTP systems to OLAP systems for the first time that type of
extract is called as Initial Extract.

Delta specifies the changes in the form of inserts and updates.


In the delta extract only the newly inserted records and changed records will be loaded
into OLAP tables. OLAP tables does not depend on the primary keys of OLTP tables for
identifying the records.
It generates its own keys called warehouse keys or surrogate keys.
Primary keys of the OLTP systems in OLAP are treated as Natural keys
Surrogate keys are only for internal identifications of the records. They should not be
used in reporting and analysis.
Business process and reporting requirement vary from one enterprise to another
enterprise.
When the data is populated into the OLAP table the existing data is truncated and
incoming data is loaded as a fresh data. Such strategy is called refresh strategy.
When a set of data has participated in the extract process next time that data will not
participate in the extraction process that type of extraction is called as Incremental
extract. Incremental extract should be implemented for transaction grain fact tables.
Fact tables:
Fact tables of OLAP systems can be classified in to
1. Transaction grain fact tables.
2. periodic snapshot fact tables
3. accumulating snapshot fact tables
4. factless fact tables

Periodic snap shot fact tables:


Periodic snap shot fact tables contains only the summarized data.

6
The data is extracted from OLTP systems as soon as a particular period passes. Every
time the data is extracted the detail data is converted into the summarized format and
loaded into period snapshot fact tables. The user can estimate the exact number of records
loaded into periodic snapshot fact tables.
These tables always represent current data.
Accumulating snapshot fact tables
The transactions of accumulating snapshot fact tables have a definite starting period and
ending period. All the values are not known in at once. The information keeps on
accumulating as the time passes by when the events occur. Both inserts and updates are
possible in the accumulating snap shot fact tables. The data is always present in the same
level of detailed and daily that data has to be extracted.
When data is captured at the detailed level more number of records exists. When the data
captured at the summarized level less number of records exists.
Transaction grain fact tables:
In the transaction grain fact tables the level of detail in the OLTP systems and OLAP
systems is same. Data is extracted from OLTP to OLAP on a daily basis.
Only insert are possible and updates are very rare.
Fact less fact tables:
By using product star schema only that products that has been sold on a specific data can
be derived. But if the user would like to know the list of products which one in promotion
did not sell the result cannot be derived from the regular star schema. We need to have
one more star schema containing a fact table with the key section but without measures.
Such type of fact tables are called fact less fact tables.
A factless fact table is table that doesn't have fact at all. They may consist of nothing but
keys. There are tow types of factless fact table.

Dimension tables:
Dimension tables are also called Slowly Changing Dimensions
Slowly changing dimensions are three types.
1. SCD type1
2. SCD type II
3. SCD type III

Master data loaded into the dimension tables is not expected to change. But when the
changes occur in the OLTP systems those changes should be reflected in the tables of
OLAP. Since the changes are occasional those tables are treated as Slowly Changing
Dimension tables.
SCD type I:
This tables maintains only current data. In the initial extracts inserts of the OLTP systems
are treated as inserts in OLAP. In delta extract inserts are treated as inserts and updates
are treated as updated.
SCD type II:
Slowly changing type II maintains current data with total history of that organization.

7
By using flags, effective date range we are identifying the records either it is old record
or updated record.
When over a record updated in the OLTP system. In flag current data is treated as insert
and update. The previous version of the flag should be changed from new to old. But in
version no mapping all the updates of the OLTP systems are treated as inserts.
In SCD type II effective date range every record is stored with effective starting date and
ending date. Maintaining the period in which that particular record is valid.
Inserts are treated as inserts and updates are treated as inserts and updates.
SCD typeIII:
It maintain current data and one time history only
The records are inserted into scd type III for the first time the old version are populated
with null and new versions populated with incoming values. When ever an updated
record is coming new values are translated into old values new versions takes the
incoming values. At any point of time only two versions are maintain in scd type III.
Conform dimensions:
A separate star schema is required for every process with in the company. The set of all
related star schemas becomes a data mart. There might be the case where a single
dimension table has to be reused across multiple star schemas.
Dimension tables created once and reused across multiple star schemas that kind of
dimensions are called as conform dimensions.
Degenerate dimensions:
A column of the key section of the fact table that does not have the associated dimension
tables but used for reporting and analysis. Such columns are called degenerate
dimensions.
Data warehousing versus Data marts:

Data warehousing:
1. Data warehouse is a enterprise oriented
2. it contains detail data
3. it is in the normalized form
4. entity relationship is used in data warehousing
5. the data is existing in the form of relational data bases.
6. these are called as tables
7. tables are not the appropriate structure for producing multidimensional reports
Data marts
1. Data mart is a departmental oriented
2. it contains summarized data
3. it is in the form of denormalization
4. multidimensional modeling is used in data marts
5. the data is existing in the form of relational as well as multi dimensional data
bases.
6. these are called as tables/ cubes.
7. multi dimensional reports can be produced from a specific set of structure called
cubes
Types of OLAPS:
OLAP (online analytical processing) is classified into four types.

8
1. MOLAP (MULTIDIMENSIONAL OLAP)
2. DOLAP (DESKTOP OLAP)
3. ROLAP (RELATIONAL OLAP)
4. HOLAP (HYBRID OLAP)
MOLAP:
The data in the case of MOLAP is stored in specially structured multidimensional
databases for storing data. Since the data is directly stored in the multidimensional format
reporting is extremely faster.
Once the data has to be migrated from the OLTP systems to data warehouse and from
data warehouse to multidimensional data base.
Current data will not be participate in the reporting process. Drilling is not possible to the
detailed level of data.
ROLAP:
In ROLAP data is stored in the relational data bases of the enterprise data warehouse
cubes have to be created at the time of reporting process. Reporting process is a slow
process. Since all the data (current + historical) is presented in the same data base. Any
type of analysis is possible in ROLAP.
DOLAP:
When ever it is not possible to carry; the complete OLAP information specific
information required for presentations is stored in the form of desktop data bases like
MS- ACCESS, EXCEL sheets and power point presentations. Such types of data bases or
flat files are called as DOLAP data bases.
HOLAP:
HOLAP is nothing but the combination of ROLAP & MOLAP
It supports all types of relational and multidimensional databases for reporting and
analysis.

Das könnte Ihnen auch gefallen