Sie sind auf Seite 1von 22

DATA WAREHOUSES

Lecture 2
2
Relational Database Theory
In relational database normalization,
relations are decomposed into smaller
relations to a point where all attributes in a
relation are very tightly coupled with the
primary key of the relation.
First normal form: data items are atomic,
Second normal form: attributes fully depend
on primary key,
Third normal form: all non-key attributes
are completely independent of each other.
3
Relation Database Theory, contd
The process of normalization generally
breaks a table into many independent
tables.
A relational database system is
effective and efficient for operational
databases a lot of updates (aiming at
optimizing update performance).
4
Problems
A fully normalized data model can
perform very inefficiently for queries.
Unnecessary joins may take
unacceptably long time
Historical data are diverse
5
Not Either-Or Decision
Query-driven approach still better for:
Rapidly changing information
Rapidly changing information sources
Clients with unpredictable needs
6
The Desired Features of the New
Type of System Environment
Database designed for analytical tasks
Data from multiple applications
Read-intensive data usage
Direct interaction with the system by the users
without IT assistance
Content updated periodically and stable
Content to include current and historical data
Ability for users to run queries and get results
online

7
Processing Requirements in the New
Environment
Four levels of analytical processing requirements:
Running of simple queries and reports against
current and historical data
Ability to perform what if analysis in many
different ways
Ability to query, step back, analyze, and then
continue the process to any desired length
Spot historical trends and apply them for future
results
8
Data Warehouse
A Definition
A data warehouse is a
subject-oriented
integrated
time-varying
non-volatile
collection of data that is used
primarily in organizational decision
making.
-- Bill Inmon, Building the Data Warehouse 1996
9
Data Warehousing


The process of constructing and
using a data warehouse
10
Data WarehouseSubject-Oriented
In the data warehouse, data is stored by
subjects, not by applications.
For an insurance company, the applications
may be auto, health, life, and casualty.
The major subject areas of the insurance
corporation might be customer, policy,
premium, and claim.
Subject-Oriented
12
Data Warehouse - Integrated
The data in the data warehouse comes from several
operational systems.
Source data are in different databases, files, and
libraries. These are disparate applications, so the
operational platforms and operating systems could be
different. The file layouts, character code
representations, and field naming conventions all could
be different.
In addition to data from internal operational systems,
for many enterprises, data from outside sources is
likely to be very important.
13
Data Warehouse - Integrated
Before the data from various disparate sources can
be usefully stored in a data warehouse, you have to
remove the inconsistencies. You have to
standardize the various data elements and make
sure of the meanings of data names in each source
application.
Before moving the data into the data warehouse,
you have to go through a process of
transformation, consolidation, and integration of
the source data.
some of the items that would need standardization:
Naming conventions
Codes
Data attributes
Measurements
14
Data Warehouse - Integrated
15
Data Warehouse -Time Variant
A data warehouse, because of the very nature of
its purpose, has to contain historical data, not
just current values. Data is stored as snapshots
over past and current periods. Every data
structure in the data warehouse contains the
time element.
If a user is looking at the buying pattern of a
specific customer, the user needs data not only
about the current purchase, but on the past
purchases as well.
16
Data Warehouse -Time Variant
The time-variant nature of the data in a
data warehouse
Allows for analysis of the past
Relates information to the present
Enables forecasts for the future
Time-Variant
18
Data Warehouse - Non Volatile
Once the data is captured in the
data warehouse, you do not run
individual transactions to change the
data there.
Data updates are commonplace in
an operational database; not so in a
data warehouse.
The data in a data warehouse is not
as volatile as the data in an
operational database is. The data in
a data warehouse is primarily for
query and analysis.
19
Data Warehouse - Non Volatile
Once the data is captured in the
data warehouse, you do not run
individual transactions to change the
data there.
Data updates are commonplace in
an operational database; not so in a
data warehouse.
The data in a data warehouse is not
as volatile as the data in an
operational database is. The data in
a data warehouse is primarily for
query and analysis.
20
Data Warehouse - Non Volatile
21
Data Granularity
In an operational system, data is usually
kept at the lowest level of detail.
If you are looking for units of a product
ordered this month, you read all the
orders entered for the entire month for
that product and add up. You do not
usually keep summary data in an
operational system.
Data granularity in a data warehouse
refers to the level of detail. The lower the
level of detail, the finer the data
granularity.
22
Data Granularity

Das könnte Ihnen auch gefallen