Designing A Data Warehouse: Issues in DW Design

Designing a Data
Warehouse
Issues in DW design
Data Warehouse
A read-only database for decision
analysis
Subject Oriented
Integrated
Time variant
Nonvolatile
consisting of time stamped
operational and external data.
Data Warehouse vs
Operational Databases
Highly tuned
Real time Data
Detailed records
Current values
Accesses small
amounts of data
in a predictable
manner
Flexible access
Consistent timing
Summarized as
appropriate
Historical
Access large
amounts of data in
unexpected ways
Data Warehouse Purpose
Identify problems in time to avoid

them
Locate opportunities you might
otherwise miss
Data Warehouse:
New Approach
An old idea with a new interest
because of:
Cheap Computing Power
Special Purpose Hardware
New Data Structures
Intelligent Software
Warehousing Problems
Business Issues
Data Quantity
Data Accuracy
Maintenance
Ownership
Cost
Business Issues
Database Issues
DBMS Software
Technology
Complexity
Business Issues
Data Issues
Analysis Issues
User Interface
Intelligent Processing
Three Approaches
Classical Enterprise Database

Contains operational data from all areas of
the organization.
Data Mart
Extracted and managerial support data
designed for departmental or EUC
applications
Data Package
Data required for a specific application
Classical Warehouse
Source
Archived data
Extraction
Batch extraction programs
Data
Atomic transaction data
Tool
VLDB technology
Analysis
IT driven software
Mart
Source
Deposit or External sources
Extraction
Batch summary
Data
Designed departmental database
Tool
OLAP, ROLAP, MDBMS
Analysis
IT driven or trained user
Package
Source
Mart
Extraction
Sample and summary
Data
Problem specific dataset
Tool
PC tools
Analysis
Trained user
Three Fundamental
Processes
Data Acquisition
Data Storage
Data a
Access
Data Acquisition
Handles acquisition of data from

legacy
systems and outside
sources.
Data is identified, copied,
formatted and prepared for loading
into the warehouse.
Acquisition steps
Catalog the data
Clean and prepare the data.
Develop an inventory of where it

is and what it means.
Extract from legacy files and
reformat to make it usable.
Transport data from one location to

another.
Storage
The storage component holds the
data so that the many
different data mining, executive
information
and
decision support systems can
make use of it effectively.
The Storage Area

Managed by
Relational databases
like those from Oracle Corp. or

Informix Software Inc.
Specialized hardware
symmetric multiprocessor (SMP)

or massively parallel processor
(MPP) machines
Storage
The majority of warehouse storage

today is being managed by
relational databases running on
Unix platforms.
Oracle, Sybase Inc., IBM Corp. and
Informix control 65 percent of the
warehouse storage market. Meta
Group Inc. (1996)
Access
Different end-user PCs and workstations

draw data from the warehouse with the
help of multidimensional analysis
products, neural networks, data
discovery tools or analysis tools.
These powerful, "smart" software
products are the real driving force
behind the viability of data
warehousing.
Access Tools
Intelligent Agents and Agencies

Query Facilities and Managed Query
Environments
Statistical Analysis
Data Discovery.
(decision support, artificial intelligence
and expert systems)
OLAP
Data Visualization
Hardware Budget
A typical startup warehouse

project allocates more than 60
percent of its budget for hardware
and software to the creation of a
powerful storage component,
spending just 30 percent on data
mining and user access
technologies.
Systems Analysis Budget
Budgeting for systems analysis and

development, however, follows a very
different pattern.
More than 50 percent of development
dollars are spent on building acquisition
capabilities,
30 percent fund the development of user
solutions and
20 percent are dedicated to the creation
of databases in the storage component.
Design Issues
Relational and Multidimensional
Models
Denormalized and indexed
relational models more flexible
Multidimensional models simpler to
use and more efficient
Star Schemas in a RDBMS

In most companies doing ROLAP, the DBAs
have created countless indexes and summary
tables in order to avoid I/O-intensive table
scans against large fact tables. As the indexes
and summary tables proliferate in order to
optimize performance for the known queries
and aggregations that the users perform, the
build times and disk space needed to create
them has grown enormously, often requiring
more time than is allotted and more space
than the original data!
Building a Data Warehouse

from a Normalized
The steps
Database
Develop a normalized entity-relationship

business model of the data warehouse.
Translate this into a dimensional model.
This step reflects the information and
analytical characteristics of the data
warehouse.
Translate this into the physical model. This
reflects the changes necessary to reach
the stated performance objectives.
The Business Model
Identify the data structure,

attributes and constraints for the
clients data warehousing
environment.
Stable
Optimized for update
Flexible
Business Model
As always in life, there are some
disadvantages to 3NF:
Performance can be truly awful. Most of
the work that is performed on
denormalizing a
data model is an
attempt to reach performance objectives.
The structure can be overwhelmingly
complex. We may wind up creating many
small relations which the user might think
of as a single relation or group of data.
Structural Dimensions
The first step is the development of the

structural dimensions. This step
corresponds very closely to what we
normally do in a relational database.
The star architecture that we will
develop here depends upon taking the
central intersection entities as the fact
tables and building the foreign key =>
primary key relations as dimensions.
Simple DW pattern.
Other Dimensions
Categorical dimensions: generated

groups (additional key components)
Partitioning dimensions: subtypes
(planned vs. actual)
Informational dimensions:
generate different types of data
(messy).

Designing A Data Warehouse: Issues in DW Design

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Designing A Data Warehouse: Issues in DW Design

Hochgeladen von

Copyright:

Verfügbare Formate

Designing a Data

Data Warehouse Purpose

Identify problems in time to avoid

Classical Enterprise Database

Batch extraction programs

Atomic transaction data

Deposit or External sources

Designed departmental database

OLAP, ROLAP, MDBMS

IT driven or trained user

Sample and summary

Problem specific dataset

Handles acquisition of data from

Catalog the data

Clean and prepare the data.

Develop an inventory of where it

Transport data from one location to

The Storage Area

like those from Oracle Corp. or

symmetric multiprocessor (SMP)

The majority of warehouse storage

Different end-user PCs and workstations

Intelligent Agents and Agencies

A typical startup warehouse

Systems Analysis Budget

Budgeting for systems analysis and

Star Schemas in a RDBMS

Building a Data Warehouse

Develop a normalized entity-relationship

The Business Model

Identify the data structure,

The first step is the development of the

Categorical dimensions: generated

Das könnte Ihnen auch gefallen