Sie sind auf Seite 1von 27

Data Warehousing

The Data Warehouse Definition


B. Imnon:
A data warehouse is a subject oriented, integrated, nonvolatile, and time-variant collection of data in support of
managements decisions.
S. Chaudhiri & U. Dayal:
Data warehousing is a collection of decision support
technologies, aimed at enabling the knowledge worker
(executive, manager, analyst) to make better and
faster decisions.

Data Warehouse Subject-Oriented


Organized around major subjects, such as

customer, product, sales.


Focusing on the modeling and analysis of data
for decision makers, not on daily operations or
transaction processing.
Provide a simple and concise view around
particular subject issues by excluding data
that are not useful in the decision support
process.

Data Warehouse Integrated


Constructed by integrating multiple, heterogeneous
data
Sources
relational or other databases, flat files, external data
Data cleaning and data integration techniques are
applied.
Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different
data
sources
When data is moved to the warehouse, it is converted.

Data Warehouse Time Variant


The time horizon for the data warehouse is
significantly longer than that of operational systems.
Operational database: current value data.
Data warehouse data: provide information from a
historical perspective (e.g., past 5-10 years)
Every key structure in the data warehouse
Contains an element of time
But the key of operational data may or may not contain
time element.

Data Warehouse Non-Volatile


A physically separate store of data
transformed from the operational
environment.
Operational update of data does not occur in
the data warehouse environment.
Does not require transaction processing,
recovery, and concurrency control mechanisms
Requires only : loading and access of data.

Decision Support and OLAP (Navathe)


Information technology to help the knowledge
worker (executive, manager, analyst) make faster
and better decisions.
Will a 10% discount increase sales volume sufficiently?
Which of two new medications will result in the best
best outcome: higher recovery rate & shorter hospitality
rate?
How did the share price of computer manufacturers
correlate with quarterly profits over the past 10 years?
On-Line Analytical Processing (OLAP) is an element
of decision support system (DSS).

Data Warehouse (Navathe)


A decision support database that is maintained
separately from the organisations operational
databases.
A data warehouse is a
subject oriented,
integrated,
time-varying,
non-volatile
collection of data that is used primarily in the
organisational decision making.

Why separate data warehouse?


Performance
The operational DBs are tuned to support known OLTP
workloads
Supporting OLAP requires special data organisations,
access methods and implementation methods
Function
The decision support requires data that may be missing
from the operational DBs
Decision support usually requires consolidating data from
many heterogeneous sources

OLTP vs.

OLAP

Goals of a Data Warehouse


The data warehouse must make an organisations
information easily accessible
The data warehouse must present the organisations
information consistently
The data warehouse must be adaptive and resilient to
change
The data warehouse must be a secure bastion that
protects our information assets
The data must serve as the foundation for improved
decision making
The business community must accept the data
warehouse if it is to be deemed successful.

Data Warehouse Architecture

Another View of the DW Architecture

Operational
Source
Systems
Extract

Extract

Extract

Data
Staging
Area
Services:
Clean, combine, and
standardize
Conform dimensions
NO USER
QUERY
SERVICES

Load

Data Store:
Flat files and
relational
tables
Processing:
Sorting and
sequential
processing

Data
Access
Tools

Data
Presentation
Area
Data Mart #1
DIMENSIONAL
Atomic and
summary data
Based on a single
business process

Access

Report Writers
Analytic
Applications

DW Bus:
Conformed
facts &
dimensions

Load

Data Mart #2
(Similarly designed)

Ad Hoc Query Tools

Modeling:
Forecasting
Scoring
Data mining

Access

Data Warehouse vs. Data Mart


Enterprise warehouse: collects all information
about subject (customer, products, sales, assets,
personnel) that span the entire organisation
Requires extensive business modelling

May take years to design and build

Data Mart: Departmental subsets that focus on


selected subjects: Marketing data mart: customer,
product, sales
Faster roll-out
Complex integration in the long term

To Meet the Requirements within DW


The data is organised differently, i.e.
multidimensional
star-joins schemas
snowflake schemas
The data is viewed differently
The data is stored differently
vector (array) storage
The data is indexed differently
bitmap indexes
join indexes

Dimensional Modelling Basic Concepts


Fact
something not known in advance,
an observation
many facts (but not all) have numerical,
continuously values
e.g., the price of a product, quantity
Attribute
describe a characteristic of a tangible thing
we do not measure them, we usually know them
usually text fields, with discrete values
e.g., the flavour of a product, the size of a product

DM Basic Concepts 2
Dimension
a business perspective from which data is
looked upon
a collection of text like attributes that are
highly correlated
e.g. Product, Store, Time
Granularity
the level of detail of data contained in the data
warehouse
e.g. Daily item totals by product, by store

Example of a Dimensional Model

The Standard Template Query

The Time Dimension

Multidimensional Data

A Sample Data Cube

Facts

Facts and Additive Property

Semiadditive fact Example

Numeric Measures of Intensity

End

Das könnte Ihnen auch gefallen