Sie sind auf Seite 1von 45

Designing the data warehouse / data marts

Methodologies and Techniques

Basic principles

Life cycle of the DW


First time load Operational Databases Refresh

Warehouse Database

Refresh

Purge or Archive Refresh

Oracle Warehouse Any Source Components Any Data Any Access


Operational data Relational / Multidimensional
Oracle Medi`

Relational tools

Text, image

Spatial

OLAP tools

External data

Web

Audio, video Applications/ Web

Oracle Intelligence Tools

IS develops users Views Current

Business users Tactical

Analysts Strategic

Oracle Reports

Oracle Discoverer

Oracle Express

Oracle Data Mart Suite

Data Modeling
Oracle Data Mart Designer
OLTP Databases Data Mart Database Oracle8
SQL*PLUS

OLTP Engines

Warehousing Engines

Data Extraction
Oracle Data Mart Builder

Data Management
Oracle Enterprise Manager

Data Access & Analysis


Discoverer & Oracle Reports

Big Bang Approach: Advantages and Disadvantages Advantages:


warehouse built as part of major project (eg: BPR) Having a big picture of the data warehouse before starting the data warehousing project

Disadvantages:
Involves a high risk, takes a longer time Runs the risk of needing to change requirements

Incremental Approach to Warehouse Development


Strategy
Definition Analysis Design Build Production

Multiple iterations Shorter implementations Validation of each phase

Benefits of an Incremental Approach


Delivers a strategic data warehouse solution through incremental development efforts Provides extensible, scalable architecture Quickly provides business benefits and ensures a much earlier return of investment Allows a data warehouse to be built based on a subject or application area at a time Allows the construction of an integrated data mart environment

Data Mart
A subset of a data warehouse that supports the requirements of a particular department or business function. Characteristics include:
Do not normally contain detailed operational data unlike data warehouses. May contain certain levels of aggregation

Dependent Data Mart


Flat Files Operational Systems

Marketing

Marketing Sales Finance Human Resources Data Warehouse

Sales

Finance Data Marts

External Data

Independent Data Mart


Operational Systems Flat Files

Sales or Marketing

External Data

Reasons for Creating a Data Mart


To give users more flexible access to the data they need to analyse most often. To provide data in a form that matches the collective view of a group of users To improve end-user response time. Potential users of a data mart are clearly defined and can be targeted for support

Reasons for Creating a Data Mart


To provide appropriately structured data as dictated by the requirements of the end-user access tools. Building a data mart is simpler compared with establishing a corporate data warehouse. The cost of implementing data marts is far less than that required to establish a data warehouse.

Data Marts Issues


Data mart functionality Data mart size Data mart load performance Users access to data in multiple data marts Data mart Internet / Intranet access Data mart administration Data mart installation

Example of DW tool OLAP


Rotate and drill down to successive levels of detail. Create and examine calculated data interactively on large volumes of data. Determine comparative or relative differences. Perform exception and trend analysis. Perform advanced analytical functions for example forecasting, modeling, and regression analysis

Original OLAP Rules


1. Multidimensional conceptual view 2. Transparency 3. Accessibility 4. Consistent reporting performance 5. Client-server architecture

Original OLAP Rules


6. Multiuser support 7. Unrestricted cross-dimensional operations 8. Intuitive data manipulation 9. Flexible reporting 10. Unlimited dimensions and aggregation levels

Relational Database Model


Attribute 1 Attribute 2 Attribute 3 Attribute 4 Name Age Gender Emp No. Row 1 Row 2 Row 3 Anderson Green Lee Ramos 31 42 22 32 F M M 1001 1007 1010 1020

Row 4

The table above illustrates the employee relation.

Multidimensional Database Model Customer Store


Store
Time Time FINANCE

SALES

Product

GL_Line

The data is found at the intersection of dimensions.

Two dimensions

Three dimensions

Specialised Multidimensional tool


Benefits:
Quick access to very large volumes of data Extensive and comprehensive libraries of complex functions
analysis Strong modeling and forecasting capabilities

Can access multidimensional and relational database structures Caters for calculated fields

Disadvantages:
Difficulty of changing model Lack of support for very large volumes of data May require significant processing power

MOLAP Server
The application layer stores data in a multidimensional structure DSS client The presentation layer provides the MOLAP multidimensional view Engine Efficient storage and processing Application layer Complexity hidden from the user Analysis using preaggregated summaries and precalculated Warehouse measures

ROLAP Server
The warehouse stores DSS client atomic data. The application layer ROLAP generates SQL for the engine three- dimensional view. Application The presentation layer Multiple layer SQL provides the multidimensional view.
Warehouse server

MOLAP
MDDB
Query Periodic load Warehouse Express Server

Data Express user

ROLAP
Cache
Live fetch Query

Data cache Warehouse Express Server

Data

Express user

Also Hybrid (HOLAP)

Choosing a Reporting Architecture


Business needs Good Potential for growth Query interface Performance enterprise architecture OK Network architecture Speed of access Openness
MOLAP

ROLAP

Simple

Complex Analysis

Data Acquisition
Identify, extract, transform, and transport source data Consider internal and external data Perform gap analysis between source data and target database objects Plan move of data between sources and target Define first-time load and refresh strategy Define tool requirements Build, test, and execute data acquisition modules

Modeling
Warehouses differ from operational structures:
Analytical requirements Subject orientation

Data must map to subject oriented information:


Identify business subjects Define relationships between subjects Name the attributes of each subject

Modeling is iterative Modeling tools are available

Modeling the Data Warehouse


1. Defining the business model 2. Creating the dimensional model 2, 3 3. Modeling summaries 4. Creating the physical model
4 Physical model 1 Select a business process

Identifying Business Rules


Location Geographic proximity 0 - 1 miles 1 - 5 miles > 5 miles Type PC Server Product Monitor 15 inch 17 inch 19 inch None Status New Rebuilt Custom

Time Month > Quarter > Year

Store Store > District > Region

Creating the Dimensional Model


Identify fact tables
Translate business measures into fact tables Analyze source system information for additional measures Identify base and derived measures Document additivity of measures

Identify dimension tables Link fact tables to the dimension tables Create views for users

Dimension Tables
Dimension tables have the following characteristics: Contain textual information that represents the attributes of the business Contain relatively static data Are joined to a fact table through a foreign key reference
Product

Channel

Facts (units, price)


Customer Time

Fact Tables
Fact tables have the following characteristics:
Contain numeric measures (metrics) of the business May contain summarized (aggregated) data May contain date-stamped data Are typically additive Have key value that is typically a concatenated key composed of the primary keys of the dimensions Joined to dimension tables through foreign keys that reference primary keys in the dimension tables

Dimensional Model (Star Schema)


Fact table Product Facts (units, price) Customer Time Channel

Dimension tables

Star Schema Model


Product Table Product_id Product_desc Store Table Store_id District_id ...

Central fact table Radiating dimensions Denormalized model


Time Table Day_id Month_id Period_id Year_id

Sales Fact Table Product_id Store_id Item_id Day_id Sales_dollars Sales_units ...
Item Table Item_id Item_desc ...

Star Schema Model


Easy for users to understand Fast response to queries Simple metadata Supported by many front end tools Less robust to change Slower to build Does not support history

Snowflake Schema Model


Product Table Product_id Product_desc
Store Table Store_id Store_desc District_id District Table District_id District_desc

Sales Fact Table Item_id Store_id Sales_dollars Sales_units


Time Table Week_id Period_id Year_id

Item Table Item_id Item_desc Dept_id

Dept Table Dept_id Dept_desc Mgr_id

Mgr Table Dept_id Mgr_id Mgr_name

Snowflake Schema Model


Direct use by some tools More flexible to change Provides for speedier data loading May become large and unmanageable Degrades query performance More complex metadata

Using Summary Data


Phase 3: Modeling summaries

Provides fast access to precomputed data Reduces use of I/O, CPU, and memory Is distilled from source systems and precalculated summaries Usually exists in summary fact tables

Designing Summary Tables


Average Maximum
Units
Product A Total Product B Total Product C Total

Total Percentage
Sales() Store

Summary Tables Example


SALES FACTS Sales Region Month 10,000 North Jan 99 12,000 South Feb 99 11,000 North Jan 99 15,000 West Mar 99 18,000 South Feb 99 20,000 North Jan 99 10,000 East Jan 99 2,000 West Mar 99 SALES BY MONTH/REGION Month Region Tot_Sales$ Jan 99 North 41,000 Jan 99 East 10,000 Feb 99 South 40,000 Mar 99 West 17,000

SALES BY MONTH Month Tot_Sales Jan 99 51,000 Feb 99 40,000 Mar 99 17,000

Summary Management in Oracle8i


Sales Region State City Product Time Sales summary

Summary advisor
Summary usage Summary recommendations Space requirements

The Time Dimension


Time is critical to the data warehouse. A consistent representation of time is required for extensibility.

Sales fact

Time dimension

How and where should it be stored?