Beruflich Dokumente
Kultur Dokumente
Fundamentals
50102GC20
Production 2.0
May 1999
M08761
Authors Copyright Oracle Corporation, 1999. All rights reserved.
All other products or company names are used for identification purposes only
and may be trademarks of their respective owners.
Contents
.....................................................................................................................................................
Preface
Profile xi
Related Publications xiv
Typographic Conventions xv
Lesson 1: Introduction
Course Objectives 1-3
Agenda 1-5
Questions About You 1-9
.....................................................................................................................................................
Data Warehousing Fundamentals iii
Contents
.....................................................................................................................................................
.....................................................................................................................................................
iv Data Warehousing Fundamentals
Contents
.....................................................................................................................................................
Summary 7-41
Practice 7-1 7-43
.....................................................................................................................................................
Data Warehousing Fundamentals v
Contents
.....................................................................................................................................................
.....................................................................................................................................................
vi Data Warehousing Fundamentals
Contents
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals vii
Contents
.....................................................................................................................................................
Glossary
.....................................................................................................................................................
viii Data Warehousing Fundamentals
Preface
.................................
Profile
.....................................................................................................................................................
Profile
Before You Begin This Course
This course is the entry-level course in the Data Warehousing curriculum. Therefore,
there are no prerequisites to this course.
Prerequisites
There are no prerequisites for this course.
.....................................................................................................................................................
Data Warehousing Fundamentals xi
Preface
.....................................................................................................................................................
Lesson Aim
Lesson 6: Analyzing This lesson identifies the analysis required to identify and
User Query Needs categorize users that may need to access data from the warehouse,
and how their requirements differ. Data access and reporting tools
are considered.
Lesson 7: Modeling This lesson examines the role of data modeling in a data
the Data Warehouse warehousing environment. The lesson presents a very high level
overview of warehouse modeling steps. You consider the different
types of models that can be employed, such as the star schema.
Tools available for warehouse modeling are introduced.
Lesson 8: Choosing a This lesson examines the computer architectures that commonly
Computing support data warehouses. The benefits of each hardware
Architecture architecture and reasons for using distributed warehouses are
examined. Students examine the technology requirements of a
database server for warehousing.
Lesson 9: Planning This lesson examines the database setup and management issues
Warehouse Storage such as partitioning, indexing, and ways to protect your database.
Lesson 10: Building In this lesson, you explore the sources of data for the data
the Warehouse warehouse data. You consider how the extraction and
transformation processes take data from source systems and
change it into data that is acceptable to the users of the data
warehouse. The lesson also describes typical data anomalies and
looks at ways to eliminate them.
Lesson 11: In this lesson, you explore how the transformation process
Transforming Data transforms data from source systems into data suitable for end user
query and analysis applications.
Lesson 12: In this lesson, you examine how the extracted and transformed data
Transportation: is transported into the warehouse.
Loading Warehouse
Data
Lesson 13: In this lesson, you examine methods for updating the warehouse
Transportation: with changed data, after the first-time load.
Refreshing
Warehouse Data
.....................................................................................................................................................
xii Data Warehousing Fundamentals
Profile
.....................................................................................................................................................
Lesson Aim
Lesson 14: Leaving a This lesson focuses on the concept of warehouse metadata, and the
Metadata Trail role it plays in a well-developed and managed warehousing
environment.
Lesson 15: This lesson investigates the ways that users may access the data in
Supporting End-User the data warehouse. Students are introduced to the concept of
Access business intelligence. The lesson discusses the discovery model
used by mining tools, and the reasons enterprises are looking at
data mining solutions for discovery of information.
Lesson 16: Web- This lesson discusses how to take advantage of the Web to deploy
Enabling the data warehouse information. It addresses internal and external
Warehouse access, as well as the advantages of Web-enabling a data
warehouse. The lesson outlines the steps involved in deploying a
Web-enabled data warehouse. Challenges in deploying a Web-
enabled data warehouse are also discussed.
Lesson 17: Managing This lesson explores the management issues, critical success
the Data Warehouse factors, and challenges to successful data warehouse
implementation. The lesson addresses issues pertaining to the
management of the entire warehouse life cycle.
.....................................................................................................................................................
Data Warehousing Fundamentals xiii
Preface
.....................................................................................................................................................
Related Publications
Oracle Publications
Title URL
Oracle8i for Data Warehousing: Fast and Simple for More http://
Data and More Users (Nov 1998) websight.us.oracle
.com
Large Scale Data Warehousing with Oracle8i, Winter http://
Corporation Sponsored Research Program websight.us.oracle
.com
DWM Handbook V1.0.0
Additional Publications
• Oracle DBA Handbook, Loney, Kevin, Osborne McGraw-Hill; ISBN: 007882406.
• Oracle: The Complete Reference, Koch, George and Kevin Loney; Oracle Press;
ISBN: 007882396X.
• The Data Warehouse Toolkit, Kimball, Ralph; John Wiley & Sons; ISBN:
0471153370.
• Building the Data Warehouse, Inmon, W.; John Wiley & Sons; ISBN:
0471141615.
• Oracle8 Data Warehousing, Dodge, Gary and Gorman, T.; John Wiley & Sons;
ISBN: 0471199524.
• The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing,
Developing, and Deploying Data Warehouses, Kimball, Ralph and others; John
Wiley & Sons, 1998; ISBN: 0471255475.
• Data Warehouse Design Solutions, Adamson, C. and Venerable, M.; John Wiley &
Sons, 1998; ISBN 0-471-25195-X.
• Data Warehousing:Architecture and Implementation, Humphries, M. et. al.,
Prentice Hall PTR, 1999; ISBN: 0-13-080902-0.
Web Sites
• Data Warehouse Institute Web site, at http://www.dw-institute.com/
index.htm
• The Data Warehouse Information Center Web site, at http://
pwp.starnetinc.com/larryg/index.html
• The Data Warehouse.com Web site, at http://data-warehouse.com/
• The Data Warehouse Knowledge Center Web site, at http://
www.datawarehouse.org
.....................................................................................................................................................
xiv Data Warehousing Fundamentals
Typographic Conventions
.....................................................................................................................................................
Typographic Conventions
Typographic Conventions in Text
Convention Element Example
Bold italic Glossary term (if The algorithm inserts the new key.
there is a glossary)
Caps and lowercase Buttons, Click the Executable button.
check boxes, Select the Can’t Delete Card check box.
triggers,
Assign a When-Validate-Item trigger . . .
windows
Open the Master Schedule window.
Courier new, Code output, Code output: debug.seti(’I’,300);
case sensitive directory names, Directory: bin (DOS), $FMHOME (UNIX)
(default is filenames,
Filename: Locate the init.ora file.
lowercase) passwords,
pathnames, Password: Use tiger as your password.
URLs, Pathname: Open c:\my_docs\projects
user input, URL: Go to http://www.oracle.com
usernames User input: Enter 300
Username: Log on as scott
Initial cap Graphics labels Customer address (but Oracle Payables)
(unless the term is a
proper noun)
Italic Emphasized words Do not save changes to the database.
and phrases, For further information, see Oracle7 Server
titles of books SQL Language Reference Manual.
and courses,
Enter user_id@us.oracle.com, where
variables
user_id is the name of the user.
Quotation marks Interface elements Select “Include a reusable module
with long names component” and click Finish.
that have only This subject is covered in Unit II, Lesson 3,
initial caps; lesson “Working with Objects.”
and chapter titles in
cross-references
Uppercase SQL column Use the SELECT command to view
names, commands, information stored in the LAST_NAME
functions, schemas, column of the EMP table.
table names
.....................................................................................................................................................
Data Warehousing Fundamentals xv
Preface
.....................................................................................................................................................
.....................................................................................................................................................
xvi Data Warehousing Fundamentals
1
.................................
Introduction
Lesson 1: Introduction
.....................................................................................................................................................
Course Objectives
Course Objectives
.....................................................................................................................................................
1-2 Data Warehousing Fundamentals
Course Objectives
.....................................................................................................................................................
Course Objectives
After completing this course, you should be able to the following:
• Explain why data warehousing is a popular solution in today’s information
technology environment
• Describe the terminology used with data warehousing
• Identify the standard components of a data warehouse implementation
• Explain the importance of using a methodology for development, and specifically
identify the phases of the Oracle Data Warehouse Method
• Identify and use data warehouse modeling concepts
• Identify the different processes required to manage and maintain the warehouse
• Identify the hardware platforms that can be employed with a data warehouse
• Identify the features required of a database server for a warehouse implementation
• Identify the tools that can be used at each phase during the data warehouse
development cycle
• Describe user profiles and the techniques users may employ for querying the
warehouse
• Identify data warehousing implementation issues and challenges
• Position the products for the Oracle warehouse
.....................................................................................................................................................
Data Warehousing Fundamentals 1-3
Lesson 1: Introduction
.....................................................................................................................................................
Day 1
• Lesson 1 Introduction
• Lesson 2 Meeting a Business Need
• Lesson 3 Defining Data Warehouse
Concepts and Terminology
• Lesson 4 Driving Implementation Through a
Methodology
• Lesson 5 Planning for a Successful Warehouse
• Lesson 6 Analyzing User Query Needs
Day 2
• Lesson 7 Modeling the Data Warehouse
• Lesson 8 Choosing a Computing Architecture
• Lesson 9 Planning Warehouse Storage
• Lesson 10 Building the Warehouse
• Lesson 11 Transforming Data
• Lesson 12 Transportation: Loading Warehouse
Data
.....................................................................................................................................................
1-4 Data Warehousing Fundamentals
Agenda
.....................................................................................................................................................
Agenda
Day 1
Lesson 1: Introduction
Lesson 2: Meeting a Business Need
Lesson 3: Defining Data Warehouse Concepts and Terminology
Lesson 4: Driving Implementation Through a Methodology
Lesson 5: Planning for a Successful Warehouse
Lesson 6: Analyzing User Query Needs
Day 2
Lesson 7: Modeling the Data Warehouse
Lesson 8: Choosing a Computing Architecture
Lesson 9: Planning Warehouse Storage
Lesson 10: Building the Warehouse
Lesson 11: Transforming Data
Lesson 12: Transportation: Loading Warehouse Data
.....................................................................................................................................................
Data Warehousing Fundamentals 1-5
Lesson 1: Introduction
.....................................................................................................................................................
Day 3
• Lesson 13 Transportation: Refreshing
Warehouse Data
• Lesson 14 Leaving a Metadata Trail
• Lesson 15 Supporting End-User Access
• Lesson 16 Web-Enabling the Warehouse
• Lesson 17 Managing the Data Warehouse
.....................................................................................................................................................
1-6 Data Warehousing Fundamentals
Agenda
.....................................................................................................................................................
Day 3
Lesson 13: Transportation: Refreshing Warehouse Data
Lesson 14: Leaving a Metadata Trail
Lesson 15: Supporting End-User Access
Lesson 16: Web-Enabling the Warehouse
Lesson 17: Managing the Data Warehouse
.....................................................................................................................................................
Data Warehousing Fundamentals 1-7
Lesson 1: Introduction
.....................................................................................................................................................
.....................................................................................................................................................
1-8 Data Warehousing Fundamentals
Questions About You
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 1-9
Lesson 1: Introduction
.....................................................................................................................................................
.....................................................................................................................................................
1-10 Data Warehousing Fundamentals
2
.................................
Overview
Meeting
Meeting aa
Planning
Business Modeling ETT Managing
for a Business the Data (Building the the Data
Successful Need
Need Warehouse Warehouse) Warehouse
Warehouse
Analyzing Supporting
User Query End User
Needs Access
Project Management
(Methodology, Maintaining Metadata)
Objectives
.....................................................................................................................................................
2-2 Data Warehousing Fundamentals
Overview
.....................................................................................................................................................
Overview
The top slide on the facing page is a road map representing the flow of the course. The
vertical box entitled “Meeting a Business Need” emphasizes that the warehouse is
business driven. The determination of the warehouse architecture, data model, and
user query needs all stem from business requirements. The horizontal box running
across the bottom represents the ongoing project management throughout the
warehouse lifecycle.
This lesson examines how data warehousing has evolved from early management
information systems to today’s decision support systems. The primary motivating
factors for data warehouse creation are explored. The types of industries employing
data warehouse are considered.
Objectives
After completing this lesson, you should be able to do the following:
• Describe why an online transaction processing (OLTP) system is not suitable for
complex analysis
• Describe how extract processing for decision support querying led to data
warehouse solutions employed today
• Explain why businesses are driven to employ data warehouse technology
• Identify some of the industries that employ data warehouses
.....................................................................................................................................................
Data Warehousing Fundamentals 2-3
Lesson 2: Meeting a Business Need
.....................................................................................................................................................
Characteristic OLTP
Screens Unchanging
Orientation Records
.....................................................................................................................................................
2-4 Data Warehousing Fundamentals
Unsuitability of OLTP Systems for Complex Analysis
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 2-5
Lesson 2: Meeting a Business Need
.....................................................................................................................................................
Ad hoc access
Production
platforms
Production
platforms
Operational reports
®
.....................................................................................................................................................
2-6 Data Warehousing Fundamentals
Management Information Systems and Decision Support
.....................................................................................................................................................
Personal Computing
With the advent of personal computing and 4GL programming techniques, MIS
became known as decision support (decision support systems or DSS). DSS was
judged to support business users better, by giving them direct access to the operational
data for additional ad hoc querying, which provided more flexible reporting as the
information was needed.
.....................................................................................................................................................
Data Warehousing Fundamentals 2-7
Lesson 2: Meeting a Business Need
.....................................................................................................................................................
Management Issues
Extract explosion
®
.....................................................................................................................................................
2-8 Data Warehousing Fundamentals
Data Extract Processing
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 2-9
Lesson 2: Meeting a Business Need
.....................................................................................................................................................
Productivity Issues
• Duplicated effort
• Multiple technologies
• Obsolete reports
• No metadata
®
.....................................................................................................................................................
2-10 Data Warehousing Fundamentals
Data Extract Processing
.....................................................................................................................................................
Data Quality Issues with Extract Processing The data quality issues in an extract
processing environment are listed below:
• The data has no time basis and users cannot compare query results with
confidence. The data extracts may have been taken at a different point-in-time.
• Each data extract may use a different algorithm for calculating derived and
computed values. This makes the data difficult to evaluate, compare, and
communicate by managers who may not know the methods or algorithms used to
create the data extract or reports.
• Data extract programs may use different levels of extraction.
• Access to external data may not be consistent, and the granularity of the external
data may not be well defined.
• Data sources may be difficult to identify, and data elements may be repeated on
many extracts.
• The data field names and values may have different meanings in the various
systems in the enterprise (lack of semantic integrity).
• There are no data correction rules to ensure that the extracted data is correct and
clean.
• The reports provide data rather than information, and no drill-down capability.
.....................................................................................................................................................
Data Warehousing Fundamentals 2-11
Lesson 2: Meeting a Business Need
.....................................................................................................................................................
Advantages of Warehouse
Processing Environment
• No duplication of effort
• No need for tools to support many technologies
• No disparity in data, meaning, or representation
• No time period conflict
• No algorithm confusion
• No drill-down restrictions
.....................................................................................................................................................
2-12 Data Warehousing Fundamentals
Data Extract Processing
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 2-13
Lesson 2: Meeting a Business Need
.....................................................................................................................................................
Business Motivators
Business Motivators
.....................................................................................................................................................
2-14 Data Warehousing Fundamentals
Business Drivers for Data Warehouses
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 2-15
Lesson 2: Meeting a Business Need
.....................................................................................................................................................
Technological Advances
• Parallelism
– Hardware
– Operating system
– Database
– Query
• Large databases
– Index
• 64-bit architectures
– Applications
• Indexing techniques
• Affordable, cost-effective
open systems
• Robust warehouse tools
8i
• Sophisticated end user tools
®
.....................................................................................................................................................
2-16 Data Warehousing Fundamentals
Business Drivers for Data Warehouses
.....................................................................................................................................................
Other Factors
• Very large volumes of data can be managed for warehouses greater than one
terabyte in size.
• Recently introduced 64-bit architectures are increasing server capacity and speed.
• Improved indexing techniques (bitmap index, hash index, star join) provide rapid
access to data.
• Warehouse tools are becoming more robust and less expensive.
• Licensing strategies are more effective and affordable.
• Open systems are available.
• Sophisticated, user-friendly, and intuitive tools are available to the user community
for all types of data warehouse access.
.....................................................................................................................................................
Data Warehousing Fundamentals 2-17
Lesson 2: Meeting a Business Need
.....................................................................................................................................................
25 60
50
20
40
15
30
10
20
5 10
0 0
1996 2001 USA Europe APAC Other
• Successful implementations
• Decreased risk
• Robust extraction software
• Improving price to performance ratios
• Improved staff training
.....................................................................................................................................................
2-18 Data Warehousing Fundamentals
Current Situation and Growth of Data Warehousing
.....................................................................................................................................................
Revenues
A recent report has shown that in 1996 data warehouse revenues (which include
hardware, software, and people-provided services) netted $8 billion (US). It is forecast
that in 2001 this figure will rise to $23 billion (U.S.), assuming a compound annual
growth rate of around 20% per year.
Geography
Most data warehouse implementations exist in the U.S., with Europe following close
behind, and then Asia Pacific.
Growth Motivators
These include:
• Increased successful implementations
• Decreased risk with vendors supplying a total solution
• More robust and functional extraction software
• Improved (and improving) price-to-performance equipment ratios
• Improved training for IT staff
Growth Inhibitors
These may include:
• Year 2000 compliance
• Shortage of skills in specific areas of data warehousing
• The lack of integrated metadata components
• The labor-intensive commitment to the data cleaning function and its
corresponding dollar and time cost
.....................................................................................................................................................
Data Warehousing Fundamentals 2-19
Lesson 2: Meeting a Business Need
.....................................................................................................................................................
0 10 20 30 40
Percentage Market Coverage
• Airline • Retail
• Banking • Telecommunications
• Health care • Manufacturing
• Investment • Credit card suppliers
• Insurance • Clothing distributors
®
.....................................................................................................................................................
2-20 Data Warehousing Fundamentals
Typical Uses of a Data Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 2-21
Lesson 2: Meeting a Business Need
.....................................................................................................................................................
Summary
.....................................................................................................................................................
2-22 Data Warehousing Fundamentals
Summary
.....................................................................................................................................................
Summary
This lesson covered the following topics:
• Describing why an online transaction processing (OLTP) system is not suitable for
complex analysis
• Describing how extracting processing for decision support querying led to data
warehouse solutions employed today
• Explaining why businesses are driven to employ data warehouse technology
• Identifying some of the industries that employ data warehouses
.....................................................................................................................................................
Data Warehousing Fundamentals 2-23
Lesson 2: Meeting a Business Need
.....................................................................................................................................................
.....................................................................................................................................................
2-24 Data Warehousing Fundamentals
Practice 2-1
.....................................................................................................................................................
Practice 2-1
1 OLTP databases hold up-to-the-minute information and are most commonly
designed as read-only databases.
True
False
2 In the scenario below, state whether it refers to an operational system or an
analytical processing system.
“Show me how a specific brand of printer is selling throughout different parts of
the United States and how this specific brand of printer is selling since it was first
introduced into my stores.”
This scenario refers to:
a An operational system
b An analytical processing system
3 Who is the target audience for the data warehouse?
a The business community in the organization
b IT professionals
c Data-entry clerks
d None of the above
e All of the above
4 Are the following statements true or false?
a Operational systems display the following qualities:
Good performance _____
Static data contents _____
High availability _____
Unpredictable CPU use _____
b Identify the reasons why business analysis is not easy with operational
systems.
Data is not structured for drill-down capablity. _____
The system is not designed for querying. _____
Data analysis can be CPU-intensive. _____
Data is not integrated between systems. _____
5 In groups of three or four, discuss the questions below and present your points to
the class at the end of the discussion.
a List some of the reasons that your company is considering implementing a data
warehouse or data mart.
.....................................................................................................................................................
Data Warehousing Fundamentals 2-25
Lesson 2: Meeting a Business Need
.....................................................................................................................................................
b What are some of the business problems that your company is trying to
answer?
c Why is the business community in your organization unable to find the
answers to their business questions based on the existing information systems?
.....................................................................................................................................................
2-26 Data Warehousing Fundamentals
3
.................................
Overview
Defining
Defining Choosing a Planning
DW
DW Concepts
Concepts Computing Warehouse
Architecture Storage
&
& Terminology
Terminology
Planning Meeting a
Business Modeling ETT Managing
for a
Need the Data (Building the the Data
Successful
Warehouse Warehouse) Warehouse
Warehouse
Analyzing Supporting
User Query End User
Needs Access
Project Management
(Methodology, Maintaining Metadata)
Objectives
.....................................................................................................................................................
3-2 Data Warehousing Fundamentals
Overview
.....................................................................................................................................................
Overview
The previous lesson covered how data warehousing has evolved from early
management information systems to today’s decision support systems that meets a
business need. This lesson defines data warehouse concepts and terminology. Note
that the “Defining Data Warehouse Concepts and Terminology” block is highlighted in
the course road map on the facing page.
Specifically, this lesson introduces the Oracle definition of a data warehouse. The
lesson offers a general description of the properties of a data warehouse. The standard
components and tools required to build, operate, and use a data warehouse are
identified.
Objectives
After completing this lesson, you should be able to do the following:
• Identify a common, broadly accepted definition of a data warehouse
• Recognize some of the operational properties of a data warehouse
• Recognize common data warehousing terminology
• Identify the functionality associated with each component required for a successful
data warehouse implementation
• Identify and position the Oracle Warehouse vision, products, and services
.....................................................................................................................................................
Data Warehousing Fundamentals 3-3
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
.....................................................................................................................................................
3-4 Data Warehousing Fundamentals
Data Warehouse Definition
.....................................................................................................................................................
Subject-Oriented
While the data in an OLTP system is stored to support a specific business process (for
example, order entry, campaign management, and so on) as efficiently as possible,
data in a data warehouse is stored based on common subject areas (for example,
customer, product, and so on) for ease of access. That is because the complete set of
questions to be posed to a data warehouse are never known. Every question the data
warehouse answers spawns new questions. Thus, the focus of the design of a data
warehouse is providing users easy access to the data so that current and future
questions can be answered.
Time-Variant
The data warehouse contains slices of data across different periods of time. With these
data slices, the user can view reports from now and in the past.
Historical
A data warehouse typically contains several years worth of data. This is necessary to
support trending, forecasting, and time-based performance reporting (for example,
current year versus previous year).
.....................................................................................................................................................
Data Warehousing Fundamentals 3-5
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Subject Integrated
Oriented
Data
Warehouse
Subject-Oriented
Equity
Plans Shares Customer
financial
Insurance information
Loans Savings
®
.....................................................................................................................................................
3-6 Data Warehousing Fundamentals
Data Warehouse Properties
.....................................................................................................................................................
Subject-Oriented
Subject-oriented data is organized around major subject areas of an enterprise, and is
useful for an enterprise-wide understanding of those subjects. For example, a banking
operational system keeps independent records of customer savings, loans, and other
transactions. A warehouse pulls this independent data together to provide financial
information. You can access subject-oriented data related to any major subject area of
an enterprise:
• Customer financial information
• Toll calls made in the telecommunications industry
• Airline passenger booking information
• Insurance claim data
The data is transformed so that it is consistent and meaningful for the warehouse.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-7
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Integrated
Savings
Current
accounts
Loans Customer
Time-Variant
01/97 January
02/97 February
03/97 March
Data Warehouse
®
.....................................................................................................................................................
3-8 Data Warehousing Fundamentals
Data Warehouse Properties
.....................................................................................................................................................
Integrated
In many organizations, data resides in diverse independent systems, making it difficult
to integrate into one set of meaningful information for analysis. A key characteristic of
a warehouse is that data is completely integrated. Data is stored in a globally
acceptable manner, even when the underlying source data is stored differently. The
transformation and integration process can be time-consuming and costly. It requires
commitment from every part of the organization, particularly top-level managers who
make the decisions and allocate resources and funds.
Data Consistency You must deal with data inconsistencies and anomalies before the
data is loaded into the warehouse. Consistency is applied to naming conventions,
measurements, encoding structures, and physical attributes of the data.
Time-Variant
Warehouse data is by nature historical; it does not usually contain the current
transactional data. Data is represented over a long time horizon, from two to ten years,
compared with one to three months of data for a typical operational system. The data
allows for analysis of past and present trends, and for forecasting using “what-if”
scenarios.
Time Element The data warehouse always contains a key element of time, such as
quarter, month, week, or day, that determines when the data was loaded. The date may
be a single snapshot date, such as 10-JAN-97, or a range, such as 01-JAN-97 to
31-JAN-97.
Special Dates A time dimension usually contains all the dates required for analysis,
including special dates like holidays and events.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-9
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Nonvolatile
Operational Warehouse
Load
Read
Insert Read
Update
Delete
Changing Data
Refresh
Refresh
Purge or Archive
Refresh
®
.....................................................................................................................................................
3-10 Data Warehousing Fundamentals
Data Warehouse Properties
.....................................................................................................................................................
Nonvolatile
Typically, data in the data warehouse is read-only. Data is loaded into the data
warehouse for the first-time load, and then refreshed regularly. Warehouse data is
accessed by the business users. Warehouse operations typically involve:
• Loading the initial set of warehouse data (often called the first-time load)
• Refreshing the data regularly (called the refresh cycle)
Accessing the Data Once a snapshot of data is loaded into the warehouse, it rarely
changes. Therefore, data manipulation is not a consideration at the physical design
level. The physical warehouse is optimized for data retrieval and analysis.
Refresh Cycle The data in the warehouse is refreshed; that is, snapshots are added.
The refresh cycle is determined by the business users. A refresh cycle need not be the
same as the grain (level at which the data is stored) of the data for that cycle. For
example, you may choose to refresh the warehouse weekly, but the grain of the data
may be daily.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-11
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Usage Curves
.....................................................................................................................................................
3-12 Data Warehousing Fundamentals
Data Warehouse Properties
.....................................................................................................................................................
Response Time and Data Operations Data warehouses are constructed for very
different reasons than online transactional processing (OLTP) systems. OLTP systems
are optimized for getting data in—for storing data as a transaction occurs. Data
warehouses are optimized for getting data out—for providing quick response for
analysis purposes.
Since there tends to be a high volume of activity in the OLTP environment, rapid
response is critical; whereas, data warehouse applications are analytical rather than
operational. Therefore slower performance is acceptable.
Nature of Data The data stored in each database varies in nature: the data
warehouse contains snapshots of data over time to support time-series analysis
whereas, the OLTP system stores very detailed data for a short time such as 30 to 60
days.
Data Organization The data warehouse is subject specific and supports analysis so
data is arranged accordingly. In order for the OLTP system to support subsecond
response, the data must be arranged to optimize the application. For example, an order
entry system may have tables which hold each of the elements of the order whereas a
data warehouse may hold the same data but arrange it by subject such as customer,
product, and so on.
Data Sources Since the data warehouse is created to support analytical activities,
data from a variety of sources can be integrated. The operational data store of the
OLTP system holds only internal data or data necessary to capture the operation or
transaction.
Usage Curves
Operational systems and data warehouses have different usage curves.
An operational system has a more predictable usage curve, the warehouse a less
predictable, more varied, and random usage curve.
Access to the warehouse varies not just on a daily basis, but may even be affected by
forces such as a seasonal variations. For this reason, you cannot expect the operational
system to handle heavy analytical queries (DSS) and continue to give good transaction
rates for the minute-by-minute processing required.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-13
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
User Expectations
• Control expectations
• Set achievable targets for query response
• Set SLAs
• Educate
• Growth and use is exponential
Enterprisewide Warehouse
.....................................................................................................................................................
3-14 Data Warehousing Fundamentals
Data Warehouse Properties
.....................................................................................................................................................
User Expectations
The difference in response time may be significant between a data warehouse and a
client-server environment fronted by personal computers. You must control the user’s
expectations regarding response. Set reasonable and achievable targets for query
response time, which can be assessed and proved in the first increment of
development. You can then define, specify, and agree on Service Level Agreements.
If users are accustomed to fast PC-based systems, they may find the warehouse
excessively slow. However, it is up to those educating the users to ensure that they are
aware of just how big the warehouse is, how much data is there, and of what the
benefit the information is both user and business.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-15
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Data
Data Mart
Warehouse
.....................................................................................................................................................
3-16 Data Warehousing Fundamentals
Data Warehouse Properties
.....................................................................................................................................................
Definition Data mart is a subset of data warehouse fact and summary data that
provides users with information specific to their requirements.
Scope A data warehouse deals with multiple subject areas and is typically
implemented and controlled by a central organizational unit such as the Corporate
Information Technology group. It is often called a central or enterprise data
warehouse.
Subjects A data mart is a simpler form of a data warehouse designed for a single
line of business (LOB) or functional area such as sales, finance, or marketing.
Data Source A data warehouse typically assembles data from multiple source
systems. A data mart typically assembles data from fewer sources.
Size Data marts are not differentiated from a data warehouses based on size, but on
use and management.
Implementation Time Data marts are typically smaller and less complex than data
warehouses and therefore are typically easier to build and maintain.
A data mart can be built as a “proof of concept” step toward the creation of an
enterprisewide warehouse.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-17
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Marketing
Sales Sales
Finance
Human Resources
Data Finance
Warehouse
Data Marts
External Data
®
Sales or Marketing
External Data
.....................................................................................................................................................
3-18 Data Warehousing Fundamentals
Data Warehouse Properties
.....................................................................................................................................................
Dependent Data Mart Dependent data marts have the following characteristics:
• The source is the warehouse. Dependent data marts rely on the data warehouse for
content.
• The extraction, transformation, and transportation (ETT) process is easy.
Dependent data marts draw data from a central data warehouse that has already
been created. Thus, the main effort in building a mart, the data cleansing and
extraction, has already been performed. The dependent data mart simply requires
data to be moved from one database to another.
• The data mart is part of the enterprise plan. Dependent data marts are usually built
to achieve improved performance and availability, better control, and lower
telecommunication costs resulting from local access to data relevant to a specific
department.
Independent Data Mart Independent data marts are stand-alone systems built from
scratch that draw data directly from operational and/or external sources of data.
Independent data marts have the following characteristics:
• The sources are operational systems and external sources.
• The ETT process is difficult. Because independent data marts draw data from
unclean or inconsistent data sources, efforts are directed toward error processing
and integration of data.
• The data mart is built to satisfy analytical needs. The creation of independent data
marts is often driven by the need for a quick solution to analysis demands.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-19
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Metadata
.....................................................................................................................................................
3-20 Data Warehousing Fundamentals
Data Warehouse Terminology
.....................................................................................................................................................
Metadata
Information about data, derived directly from the business owners and users, is
maintained to support operations and use of the data warehouse.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-21
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Enterprise data
Architecture warehouse
Business
area
warehouse
Data
integration
Source
data
.....................................................................................................................................................
3-22 Data Warehousing Fundamentals
Data Warehouse Terminology
.....................................................................................................................................................
Architecture
A set of rules or structures providing a framework for the overall design of a system or
product.
Technical Infrastructure
The technologies, platforms, databases, gateways, and other components necessary to
make the architecture functional within the corporation.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-23
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Methodology
Modeling
.....................................................................................................................................................
3-24 Data Warehousing Fundamentals
Components of a Data Warehouse
.....................................................................................................................................................
Methodology
Employing a methodology for the development of any system is always important. In
a warehouse environment even more so. The warehouse is such a big investment, in
every resource you can think of, that its success is essential.
To avoid failure of the warehouse implementation, you must employ a methodology
and keep to it. Failure is generally caused in two ways. The first cause of failure is that
the warehouse is not delivered on time, and the second is that the warehouse fails to
deliver what the business users need. A good method helps to manage expectations by
identifying clear deliverables.
Modeling
The warehouse may be modeled from scratch or using an existing operational model
that defines the operational systems. It is more common (and recommended) to model
from scratch, referencing the source systems available and identifying any gaps in data
needs.
The data warehouse is modeled in a different way from an operational system. First,
the structure needs to take into account the way data is analyzed, and the schema is
created accordingly. Second, the warehouse is based upon subjects (not functions), and
it is these subject areas that form the basis of the model.
Subject areas are modeled and implemented one at a time.
Modeling Tools You can use specific modeling tools, such as Oracle Designer/2000,
to model the warehouse initially and facilitate iterative development.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-25
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Data Management
.....................................................................................................................................................
3-26 Data Warehousing Fundamentals
Components of a Data Warehouse
.....................................................................................................................................................
ETT Tools Specialized tools make these tasks comparatively easy to setup,
maintain, and manage, compared to in-house developed programs. Specialized tools
are available from Oracle with the Data Mart Suite.
Specialized tools can be an expensive option, which motivates many warehouses to
employ customized ETT programs written in COBOL, C++, PL/SQL, or other
programming languages or application development tools.
Data Management
The heart of the warehouse is the database management system (or Server, in the case
of Oracle), which must be:
• Productive
• Flexible
• Robust
• Scalable
• Efficient
The server must possess many other properties (they are considered in a later lesson).
The warehouse environment must also be capable of managing the hardware,
operating system, and overall network infrastructure.
Warehousing environments normally employ a relational database management
system (RDBMS) or server.
Tools Oracle provides tools (such as Oracle Enterprise Manager) that can be used to
manage and control access to the warehouse environment.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-27
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Forecasting
Drill-down
Warehouse
Database
• Tools that retrieve data for business analysis
• Imperatives
– Ease of use
– Intuitive
– Metadata
– Training
• More than one tool may be required ®
.....................................................................................................................................................
3-28 Data Warehousing Fundamentals
Components of a Data Warehouse
.....................................................................................................................................................
Tools It is important that the tools are intuitive and easy to use. It is imperative that
the warehouse data is presented to the user in a meaningful business specific manner,
one that the user can easily interpret. Metadata provides the user with these data
descriptions and navigation information.
Users have different query requirements, and one query tool may not fit all
requirements. Users may need to perform simple to complex business modeling; trend
analysis using data spanning time periods; complex drill-down; simple queries on
prepared summary information; what-if analysis; detailed trend analysis and
forecasting; and data mining.
Note: Data warehouse implementors, or WTI partners, may need to provide extensive
and intensive training in the use and optimization of selected extraction and reporting
tools. If the tools are SQL-based, for example, the user needs to know how many
tables or indexes can be used before execution impedes system performance.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-29
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Relational
Relational / tools
Operational Multidimensional
data
Oracle Medi‘
OLAP
Text, image Spatial
tools
Audio,
External Web video
data Applications/ Web
.....................................................................................................................................................
3-30 Data Warehousing Fundamentals
Oracle Warehouse Vision, Products, and Services
.....................................................................................................................................................
Loading Any Source Oracle and a variety of third-party provide solutions to extract
and load data from multiple data sources into the warehouse. You can gather data from
multiple sites, and multiple applications.
Managing Any Data Oracle warehouses using Oracle7, Oracle8, and Oracle8i
relational database management systems can store any data, including atomic,
summary, and transient data. You can also store metadata definitions about the data.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-31
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Data Modeling
Oracle Data Mart Designer
OLTP Data Mart
Databases Database
OLTP Ware-
Engines housing Oracle8
Engines
SQL*PLUS
.....................................................................................................................................................
3-32 Data Warehousing Fundamentals
Oracle Warehouse Vision, Products, and Services
.....................................................................................................................................................
Oracle Data Mart Suite This suite consists of seven products, all of which are used
in this course except Oracle Web Application Server and Oracle Reports. Each of the
products in the Oracle Data Mart Suite plays a role in the implementation or use of the
data mart. ODMS delivers an integrated package with the software and documentation
needed to implement a data mart quickly and easily. ODMS consists of these products:
• Oracle Enterprise Server
• Oracle Enterprise Manager
• Oracle Data Mart Designer
• Oracle Data Mart Builder
• Oracle Discoverer
• Oracle Web Application Server
• Oracle Reports and Reports Server
.....................................................................................................................................................
Data Warehousing Fundamentals 3-33
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
.....................................................................................................................................................
3-34 Data Warehousing Fundamentals
Oracle Warehouse Vision, Products, and Services
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 3-35
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
.....................................................................................................................................................
3-36 Data Warehousing Fundamentals
Oracle Warehouse Vision, Products, and Services
.....................................................................................................................................................
Oracle Reports, Oracle Discoverer, and Oracle Express are interoperable today,
providing seamless analysis across the entire business intelligence spectrum.
Discoverer users are able to dynamically pass the contents of a workbook to Express,
building a multidimensional cube “on the fly” and invoking the Express calculation
engine for more sophisticated analysis. Conversely, Express users are able to “drill
out” to Discoverer to explore the detail-level data in the relational system from data
summarized in an Express cube. Oracle Reports publishes views of data from both
Discoverer worksheets and Express data cubes.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-37
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Oracle Oracle
Education Consulting
Customers
.....................................................................................................................................................
3-38 Data Warehousing Fundamentals
Oracle Warehouse Vision, Products, and Services
.....................................................................................................................................................
Oracle Consulting This service provides full life-cycle implementation services for
data warehousing solutions. Oracle Consulting has leveraged Oracle’s heavy
investment in new technology development through involvement in leading-edge
client engagements. It has also built knowledge repositories and problem-solving
approaches in data warehousing and incorporated them in its Data Warehouse Method.
Major new programs are being planned by Oracle Consulting’s Data Warehousing
Practice to help companies think about and manage their customers and their
businesses in better ways. Concepts such as one-to-one marketing and balanced
scorecard are brought to life with data warehousing technology and by professionals
who can provide a transition from management vision to fully operational systems.
Oracle Education This service offers a suite of products and services to meet your
training needs, including instructor-led training, online interactive learning, interactive
courseware, in-depth seminars, customized classes, and enterprisewide performance
consulting services. Oracle offers courses in a variety of media such as:
• Instructor-led training (ILT) courses run either at an Oracle Education Center or
even on your site
• Customized training (combining media offerings)
• Media based training using Computer Based Training (CBT) courses
Oracle Support Services This service offers a range of program options, enabling
customers to select the best fit for their organization. Ranging from basic telephone
support and Web-based systems to highly customized, on-site support, the programs
include OracleFoundation, OracleMetals, OracleExpertise, and OracleLifecycle. There
are three global support centers and more than 90 local centers worldwide constitute a
global support infrastructure that enables Oracle Support Services to provide around-
the-clock, around-the-world coverage for core technology and mission-critical
applications.
.....................................................................................................................................................
Data Warehousing Fundamentals 3-39
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
Summary
.....................................................................................................................................................
3-40 Data Warehousing Fundamentals
Summary
.....................................................................................................................................................
Summary
This lesson covered the following topics:
• Identifying a common, broadly accepted definition of the data warehouse
• Distinguishing the differences between OLTP systems and analytical systems
• Defining some of the common data warehouse terminology
• Identifying some of the elements and processes in a data warehouse
• Identifying and positioning the Oracle Warehouse vision, products, and services
.....................................................................................................................................................
Data Warehousing Fundamentals 3-41
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
.....................................................................................................................................................
3-42 Data Warehousing Fundamentals
Practice 3-1
.....................................................................................................................................................
Practice 3-1
1 Indicate whether the following statements about warehouse data are true or false.
Statement True False
a Data is organized by time.
b Data is always stored in a relational database.
c Data relates to business-specific areas.
d Data is sometimes integrated.
e Data is replaced according to a refresh cycle.
f Data warehouses may contain any type of data.
2 _______ is a set of rules or structures providing a framework for the overall design
of a system or product.
a Technical infrastructure
b Data access environment
c Architecture
3 The ________ is closely related to the architecture and consists of the
technologies, platforms, databases, gateways, and other components necessary to
make the architecture functional within the corporation.
a Data access environment
b Technical infrastructure
c Data warehouse
4 A telco company needs to understand their network traffic to better pinpoint
frequent trouble spots and predict network expansion and usage. Storing call detail
records and summarizing them by switch and trunk groups among other things in
another environment will satisfy this need.
Which of the following are you going to design?
a Operational data store (ODS)
b Data warehouse
.....................................................................................................................................................
Data Warehousing Fundamentals 3-43
Lesson 3: Defining Data Warehouse Concepts and Terminology
.....................................................................................................................................................
5 An online bookstore has customers in their Sales Order System and in their
Marketing System. These customers do not match between systems, because
Marketing staff do not always update the Marketing System with current and
complete customer data. Also, they want to develop profiles of their customers
according to buying patterns and summarize product sales to get the feedback
necessary to improve marketing programs and promotions.
Which of the following are you going to design?
a Operational data store (ODS)
b Data warehouse
6 Discussion: Discuss the questions below about data warehousing concepts and
terminology and present your points to the class at the of the discussion.
a Discuss whether a data warehouse, enterprisewide data warehouse,
independent data mart, dependent data mart, or operational data store is most
suitable for your company’s needs.
b Discuss how the pieces of the classic Inmon’s definition of a data warehouse,
“A data warehouse is subject oriented, integrated, time variant, non volatile
collection of data in support of management’s decision making process” apply
to your environment.
c How will your recommendations in question 6a above deliver benefits?
.....................................................................................................................................................
3-44 Data Warehousing Fundamentals
4
.................................
Driving Implementation
Through a Methodology
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Overview
Planning Meeting a
Business Modeling ETT Managing
for a
Need the Data (Building the the Data
Successful
Warehouse Warehouse) Warehouse
Warehouse
Analyzing Supporting
User Query End User
Needs Access
Project
Project Management
Management
(Methodology,
(Methodology, Maintaining
Maintaining Metadata)
Metadata)
Objectives
.....................................................................................................................................................
4-2 Data Warehousing Fundamentals
Overview
.....................................................................................................................................................
Overview
The previous lesson covered data warehouse concepts and terminology. This lesson
discusses the need of driving a data warehouse implementation project through a
methodology. Note that the “Project Management” block is highlighted in the course
road map on the facing page.
Specifically, this lesson introduces the Oracle Data Warehouse Method, a
methodology employed by Oracle Consulting Services for incremental development
of a total warehouse solution by using a phased development approach. Partnering
initiatives launched by Oracle are described.
Objectives
After completing this lesson, you should be able to do the following:
• Explain the different approaches to warehouse development and the benefits of an
incremental approach to development
• Identify the purpose of the Oracle Method
• Discuss the purpose and fundamental elements of the Oracle Consulting Data
Warehouse Method
• Identify the Data Warehouse Method as a series of processes and approaches
• Discuss the objectives of the Oracle Warehouse Technology Initiative
.....................................................................................................................................................
Data Warehousing Fundamentals 4-3
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
• Advantages:
– The only real advantage is where the
warehouse is being built as part of another
major project or program such as
reengineering and they are dependent on each
other
– Having a “big picture” of the data warehouse
before starting the data warehousing project
• Disadvantages:
– Involves a high risk, takes a longer time
– Runs the risk of needing to change
requirements
®
.....................................................................................................................................................
4-4 Data Warehousing Fundamentals
Warehouse Development Approaches
.....................................................................................................................................................
Advantages of the “Big Bang” Approach There are no real advantages in this
approach over other approaches, and it should be avoided in most cases.
• The only real advantage is where the warehouse is being built as part of another
major project or program such as reengineering and they are dependent on each
other
• Having a “big picture” of the data warehouse before starting the data warehousing
project
Disadvantages of the “Big Bang” Approach The following are the disadvantages
to this approach.
• Involves a high risk
• Takes a longer time to deliver any perceived business benefit
• Runs the risk of needing to change requirements, which will change during
analysis
.....................................................................................................................................................
Data Warehousing Fundamentals 4-5
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Build Definition
.....................................................................................................................................................
4-6 Data Warehousing Fundamentals
Warehouse Development Approaches
.....................................................................................................................................................
Incremental Approach
The incremental approach manages the growth of the data warehouse by developing
incremental solutions that comply with the full-scale data warehouse architecture.
Rather than starting by building an entire enterprisewide data warehouse as a first
deliverable, start with just one or two subject areas, implement them as scalable data
mart and roll them out to your end users. Then, after observing how users are actually
using the warehouse, add the next subject area or the next increment of functionality to
the system. This is also an iterative process. It is this iteration that keeps the data
warehouse in line with the needs of the organization.
Think big and start small. In other words, your strategy identifies the enterprisewide
warehouse which is delivered by small increments, in short timeframes.
Benefits
Some of the benefits of the incremental approach to warehouse development are listed
below.
• Delivers a strategic data warehouse solution through incremental development
efforts
• Provides extensible, scalable architecture
• Supports the information needs of the enterprise organization
• Quickly provides business benefit and ensures a much earlier return of investment
• Allows a data warehouse to be built based on a subject or application area at a time
• Allows the construction of an integrated data mart environment
.....................................................................................................................................................
Data Warehousing Fundamentals 4-7
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Top-Down Approach
Legacy data
Sales
Operations data
Marketing
External data
sources
®
Top-Down Approach:
Advantages and Disadvantages
• Advantages:
– Provides a relatively quick implementation and
payback
– Offers significantly lower risk
– Emphasizes high-level business needs
– Achieves synergy among subject areas
• Disadvantages:
– Requires an increase in up-front costs
– Difficult to define the boundaries
– May not be suitable unless the client needs
cross-functional reporting
®
.....................................................................................................................................................
4-8 Data Warehousing Fundamentals
Warehouse Development Approaches
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 4-9
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Bottom-Up Approach
Data Data
marts warehouse
Legacy data
Sales
Operations data
Marketing
External data
sources
®
Bottom-Up Approach:
Advantages and Disadvantages
• Advantages:
– Appealing to IT
– Easier to get buy-in from IT
• Disadvantages:
– Requires source systems to encapsulate the
current business processes
– Design may be out-of-date before delivery
– Requires reengineering for each increment
– Solutions may be rejected by the next line of
business to be involved
– Overall benefit to the business may be
minimized
®
.....................................................................................................................................................
4-10 Data Warehousing Fundamentals
Warehouse Development Approaches
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 4-11
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Oracle Method
• Consists of:
– Online guidelines and manuals
– Workplan templates
– Deliverable templates
• Created by experienced and field-based
practitioner for estimating, managing, developing,
and delivering business solutions.
.....................................................................................................................................................
4-12 Data Warehousing Fundamentals
The Need for an Iterative and Incremental Methodology
.....................................................................................................................................................
Oracle Method
The Oracle Method (OM) methodology provides the means to document, standardize,
reuse, and improve the way that we deliver services. It consists of online guidelines
and manuals, workplan templates, and deliverable templates created by experienced
and field-based practitioner for estimating, managing, developing, and delivering
business solutions.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-13
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Method Materials
.....................................................................................................................................................
4-14 Data Warehousing Fundamentals
Oracle Data Warehouse Method
.....................................................................................................................................................
Method Materials
The Oracle Method includes software and hard copy handbooks for all lines of
business. These components of the Oracle Method assist all members of your project
team, from project managers to analysts to developers.
The software includes:
• Workplan templates*
• Deliverable templates*
• Online handbooks
• Estimating software
The hard copy handbooks contain:
• Method handbook
• Process and task reference*
• Deliverable reference*
* Not production available yet and will be available in later releases.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-15
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
• Focuses on scoping
• Manages risk
• Relies on user involvement throughout
• Delivers an extensible, scalable solution
• Uses a variety of technologies
• Identifies tasks with clear objectives and
deliverables
• Employs common techniques, skills, and
dependencies
• Assigns tasks to processes and processes to
phases
®
Benefits
Consistency Flexibility
Experience and
best practices
.....................................................................................................................................................
4-16 Data Warehousing Fundamentals
Oracle Data Warehouse Method
.....................................................................................................................................................
Benefits
The experience and best practices provide the following benefits:
• Consistency is achieved among consultants and practitioners because all
organizations are working from a common set of tasks and deliverables with a
clear understanding of the development processes.
• Productivity is increased by following established approaches and adhering to
successful practices. Productivity is also improved by the reduction in mistakes
and reworking, and the ability for a consultant to understand the structure and flow
of the project very quickly.
• Flexibility is gained by providing a structured development environment that
allows personnel to be used efficiently based on skills and availability. Flexibility
is also achieved by using a common set of tasks as a foundation for the project
with the ability to customize the tasks based on the needs of each client.
• Low risk is achieved through the use of a common set of tasks that outlines the best
ways of developing a warehouse. Mistakes are avoided and the impacts of
decisions can be evaluated within the framework and guidelines of experience.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-17
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
• Approaches
• Phases
• Processes
• Tasks and deliverables
• Roles
Phase 1 Phase 2 Phase 3
Process 1
Process 2
.....................................................................................................................................................
4-18 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 4-19
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Approaches
Incremental Packaged
data mart
Increment I
Proof of Concept Data mart
Warehouse Business
infrastructure application Data mart
implementation implementation Warehouse
Data mart
Increment II Increment II
through N through N
®
.....................................................................................................................................................
4-20 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 4-21
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Incremental Approach
Business Warehouse Strategy Requirements
Strategy Phase Capture
IT Scoping Services
Strategy Technical Architecture
Services
Warehouse Warehouse
Infrastructure Increment 1 Increment A Business Solution
Services Proof of Concept Services
Increment 2 Increment B
Increment 3 Increment C
Increment n Increment z
®
Incremental Development
• Focus on business
functionality Strategy
• Deliver business
Incremental
Development
Definition
benefit
Analysis
.....................................................................................................................................................
4-22 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Incremental Approach
The incremental approach is the preferred Oracle approach to building an enterprise
data warehouse solution; it is effective and proven. This approach manages the growth
of the data warehouse by developing incremental solutions that comply with the full-
scale data warehouse architecture.
The architecture is designed to provide a solid framework for the long-term data
warehouse. It includes a central data warehouse with corporate data for all functional
areas, and the functionality to populate, manage, and access the full-scale data
warehouse.
The data warehouse also controls and feeds each data mart within the architecture. By
establishing this architecture, the strategic data warehouse can grow incrementally
while supporting data extensibility and avoiding a divergent group of data marts.
Incremental Development The increments start with the strategy phase, which
defines the overall data warehouse solution and architecture at a high level, including:
• Scope of entire solution
• Identification and prioritizing of increments
• Initial technical architecture
• Initial data warehouse architecture
An initial increment is then developed following the phasing model. The increment is
usually scoped to provide maximum benefit, target a specific user audience, and
ensure that the concept can be proved.
At the end of each increment, the discovery phase acts as the review and evaluation
phase. Subsequent increments follow the same phasing approach, building on
experiences gained and lessons learned from development of the first increment.
Data Mart Development DWM also provides an approach for the development of a
solution scoped to address the requirements of a specific functional area or
organization—a data mart solution.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-23
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Analysis
Data acquisition
Design
Architecture
Build
Data quality
Transition
Administration
Discovery
®
Analysis
Data access
Design
Documentation
Build
Testing
Transition
Training
Discovery
®
.....................................................................................................................................................
4-24 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Strategy Phase The goal of the strategy phase is to clearly define the business
objectives and purpose of the data warehouse solution. Business objectives for the data
warehouse project must be driven by top management and must be business-centric.
The purpose and objectives for the total data warehouse solution are essential to
setting and managing expectations. The strategy phase also clearly defines the data
warehouse team and the executive sponsor.
The overall objectives of the strategy phase include:
• Achieve a clear awareness of the business goals and objectives.
• Derive the data warehouse scope from business objectives.
• Document a clear definition of the data warehouse scope in its entirety.
• Document the incremental approach used to support the business objectives.
• Define success measurements.
• Identify the operational and external data sources required to support the business
goals.
• Outline the strategies for data acquisition and data quality.
• Define the strategy for warehouse administration.
• Identify the role of metadata and document the strategy for metadata management.
• Define the data access methods necessary to support business objectives.
• Describe the strategy for warehouse documentation and training.
• Identify the testing methods necessary to support user acceptance.
• Identify the existing technical architecture and capacity plan.
• Create the enterprise data warehouse architecture.
• Determine the configuration and capacity requirements.
Prerequisite information needed for the strategy phase includes:
• High-level business descriptions and existing reference material
• Source system documentation and data models, including external data providers
Note: Without a complete understanding of the business objectives and scope of the
overall warehouse you will not be able to proceed successfully.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-25
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Analysis
Data acquisition
Design
Architecture
Build
Data quality
Transition
Discovery
®
Analysis
Metadata management
Design
Data access
Build
Documentation
Transition
Training
Discovery
®
.....................................................................................................................................................
4-26 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Definition Phase The goal of the definition phase is to clearly define the scope and
objectives for the incremental development effort. Initial increment, conceptual
models are created, data sources are documented, and the scope of data quality is
clearly defined. The technical architecture and data warehouse architecture are also
created.
The overall objectives of the definition phase include:
• Document a clear scope of the definition phase.
• Understand operational and external data sources.
• Plan for the initial load and refresh of the warehouse.
• Define the interface, configuration, and capacity requirements.
• Integrate metadata.
• Define the scope of the data quality effort.
• Outline warehouse administration efforts.
• Outline data access methods.
• Train the user community.
Prerequisite information needed for the definition phase includes:
• Business goals and objectives
• Data warehouse purpose, objectives, and scope
• Enterprise data warehouse logical model
• Source system data flows
• Subject area gap analysis
• Data acquisition strategy
• Data warehouse architecture and technical infrastructure
• Data access environment and data quality strategy
• Data warehouse administration strategy, metadata strategy, and training strategy
.....................................................................................................................................................
Data Warehousing Fundamentals 4-27
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Analysis
Data acquisition
Design
Architecture
Build
Data quality
Transition
Administration
Discovery
®
Analysis
Data access
Design
Documentation
Build
Testing
Transition
Training
Discovery
®
.....................................................................................................................................................
4-28 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Analysis Phase The goal of the analysis phase is to focus on the users’ information,
data acquisition, and data access requirements for business analysis and decision
making. Relational and multidimensional models are produced for the data warehouse,
metadata, and if appropriate, the data marts. Tool selection is also completed for all
appropriate warehouse components during this phase.
The overall objectives of the analysis phase include:
• Collect and model detailed data requirements, including summarization, to support
the business requirements.
• Identify and model multidimensional structures.
• Map source data to target database objects.
• Resolve design conflicts and data quality issues.
• Collect and model metadata requirements.
• Collect detailed data access, reports, and query requirements.
• Select the appropriate tools for data acquisition, data quality, administration,
metadata, and data access components of the warehouse project.
Prerequisite information needed for the strategy phase includes:
• Business goals and objectives
• Data warehouse purpose, objectives, and scope
• Detailed data load, refresh, and summarization plan
• Detailed data quality acceptance plan
• Data warehouse architecture, technical infrastructure, and capacity plan
• Warehouse administration and metadata integration plans
• Data access and training plans
• Viable data acquisition tools, data quality tools, metadata tools, and data access
tools lists
.....................................................................................................................................................
Data Warehousing Fundamentals 4-29
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Administration
Transition
Metadata management
Discovery
®
Documentation
Design
Testing
Build
Training
Transition
Transition
Discovery
®
.....................................................................................................................................................
4-30 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Design Phase The goal of the design phase is to transform the requirements
identified during the analysis phase into detailed design specifications and to complete
the technical architecture installation.
The overall objectives of the design phase include:
• Document a clear scope of the design phase.
• Design the initial data load and refresh modules.
• Execute the hardware and software installation plan.
• Design the data cleansing, error and exception handling, and audit and control
modules.
• Outline the metadata specifications for reporting, bridging, and capturing.
• Design the end user layer and standard queries and reports.
• Establish and document the user and role access privileges.
• Create the database designs for the data warehouse, data mart, metadata repository,
and multidimensional structures identified during the analysis phase.
• Document the initial version of all modules designed.
• Create the test plans for integration testing, system testing, regression testing,
volume testing, and ad hoc query testing.
Prerequisite information needed for the design phase includes:
• The initial data load and refresh requirements
• The technical infrastructure and data warehouse architecture
• The data acquisition plan
• The metadata requirements
• The data access requirements
• The test strategy
.....................................................................................................................................................
Data Warehousing Fundamentals 4-31
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Analysis
Architecture
Design
Data quality
Build
Administration
Transition
Metadata management
Discovery
®
Design Documentation
Build Testing
Training
Transition
Transition
Discovery
®
.....................................................................................................................................................
4-32 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Build Phase The goal of the build phase is to create and test the database structures,
data acquisition modules, warehouse administration modules, metadata modules, data
access modules, and reports and queries.
The overall objectives of the build phase include:
• Deliver a well-designed, thoroughly-tested, and integrated data warehouse
solution.
• Optimize the database structures to meet design standards and performance
objectives.
• Deliver access components.
• Deliver documentation for using and maintaining the warehouse.
Prerequisite information needed for the design phase includes:
• The data acquisition module designs
• The technical architecture and capacity plan
• The data quality and issue resolution plans
• The warehouse administration and scheduling plan
• The metadata implementation plan
• Specifications for the end-user layer, standard queries and reports, roles and
privileges, and query governor limits
• The logical and physical database and multidimensional database design
• The index and data storage design
• The user guide, the metadata reference guide, and the warehouse administration
reference
• Test plans for integration testing, system testing, environment testing, regression
testing, and ad hoc access testing
.....................................................................................................................................................
Data Warehousing Fundamentals 4-33
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Analysis Testing
Design Training
Build
Transition
Transition
Post-implementation support
Discovery
®
Discovery Phase
Strategy
Definition
Analysis
Design
Build Discovery
Post-implementation
Transition
support
Discovery
®
.....................................................................................................................................................
4-34 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Discovery Phase The goal of this phase is to evaluate the implemented increment,
identify increment opportunities, and identify and plan for the next increment. This
enables for the users and developers to analyze the effort most recently undertaken,
make adjustments, review the possible increments, and select the next effort based on
business need and data warehouse infrastructure need.
The overall objectives of this phase include:
• Perform a detailed evaluation of the implemented increment.
• Identify opportunities and select the next increment.
• Evaluate the completed project plan and consider experiences and lessons learned
from previous efforts.
• Drive ongoing data warehouse development with business need and user input.
Prerequisite information needed for the discovery phase includes:
• System in production
• Increment project plan
• Use log evaluation
• Enterprise data warehouse implementation road map and infrastructure road map
• Enterprise data warehouse architecture and technical architecture
• Increment technical architecture
• Enterprise data warehouse requirements
.....................................................................................................................................................
Data Warehousing Fundamentals 4-35
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Processes
Processes
.....................................................................................................................................................
4-36 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Processes
A process is a cohesive set of related tasks that meets a specific project objective and
results in key deliverables.
Each process is a discipline involving similar skills to perform the tasks within the
process. You might think of a process as a simultaneous subproject within a larger
development project.
Every data warehouse project involves most if not all of the following processes,
whether they are the responsibility of the consulting team, the client, IT staff, a third
party, or a combination of these. Most processes overlap in time with others and are
interrelated through common deliverables, while others are strict predecessors of each
other.
• Business Requirements Definition
• Data Acquisition
• Architecture
• Data Quality
• Warehouse Administration
• Metadata Management
• Data Access
• Database Design and Build
• Documentation
• Testing
• Training
• Transition
• Post-Implementation Support
.....................................................................................................................................................
Data Warehousing Fundamentals 4-37
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
• Defines requirements
• Clarifies scope
• Establishes implementation road map
• Provides initial focus on enterprise
implementation
• Identifies information needs
• Models the requirements
Data Acquisition
.....................................................................................................................................................
4-38 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Data Acquisition
The Data Acquisition process identifies, extracts, transforms, and transports all source
data necessary for the operation of the data warehouse. Data acquisition is performed
among several components of the warehouse, including operational and external data
sources to data warehouse, data warehouse to data mart, and data mart to individual
marts.
Early in the data acquisition process, data sources are identified and evaluated against
the subject areas, and gap analysis is conducted to ensure that the data is available to
support the information requirements. Strategies are developed for the first-time load
of the warehouse and for the subsequent refreshes of the warehouse.
You evaluate tools against high-level requirements and make recommendations.
With the detailed analysis output, modules are designed and built to extract, transform,
transport, and load the source data into the warehouse. Once built, the modules are
tested and executed and the production database objects are populated.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-39
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Architecture
Data Quality
.....................................................................................................................................................
4-40 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Architecture
The Architecture process specifies elements of the technical foundation and
architectural design of the data warehouse. The focus is on integrating different
products and the data warehouse components to ensure an extensible and scalable
architecture.
For the technical architecture, an evaluation is performed to determine whether the
database environment should be distributed or centralized. Network, hardware and
software requirements, including acquisition; infrastructure changes; and the platform
configuration are defined and implemented.
The platform configuration covers the data acquisition environment, server
architecture, middleware, database sizing, and disk striping.
The data warehouse architecture ensures an integrated strategic data warehouse
architecture while delivering incremental solutions.
Data Quality
The Data Quality process ensures the consistency, reliability, and accuracy of the data
in the warehouse. A data quality strategy is developed based upon a clear
understanding of the agreements and contractual obligations for data cleansing, audit
and control, and integrity functions.
Data management procedures are defined.
Data quality tools are evaluated and recommended.
The process identifies the business rules for error exception and handling, scrubbing
and cleansing, and audit and control. The business rules for error handling may vary
between the initial load and subsequent updates to the data warehouse. Using the data
quality strategy, procedures, and tools, modules are developed to support the
requirements for data quality.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-41
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Warehouse Administration
Metadata Management
.....................................................................................................................................................
4-42 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Warehouse Administration
The Warehouse Administration process specifies the strategy and requirements for the
maintenance, use and ongoing update of the data warehouse. Strategies are established
for configuration management, warehouse administration, and data governing.
Warehouse administration workflow, tool evaluation, and testing are addressed.
Modules are designed and built for scheduling, backup and recovery, archiving,
security, audit, and data governing. Several data access management and monitoring
tasks are addressed during this process, including authorizing access to appropriate
levels of data, monitoring usage, governing queries, identifying repetitive queries,
calculating metrics, defining access thresholds, adding or removing users, and
updating access authority.
To provide successful ongoing support and maintenance of the warehouse, this process
focuses on the automation of the warehouse management tasks.
The process also defines strategies for security and control, backup and recovery,
disaster recovery, archiving, and restoration.
Metadata Management
The Metadata Management process specifies the metadata strategy and the
requirements for the metadata repository, integration, and access. The primary
objective of this process is to provide technical and business views of the warehouse
metadata.
• The technical view focuses on compiling the metadata to support warehouse
management. This view includes data acquisition rules; transformation of source
data to the target database; time and date of data; data authorization; refresh,
archive, and backup schedules and results; and the data accessed, including
metrics such as frequency and volume of requests.
• The business view focuses on enabling users to understand the information
available in the warehouse and how it may be accessed. The business metadata
focuses on what data is in the warehouse, the source of the data, how it was
transformed from source to target, and information compiled while accessing the
warehouse.
The Metadata Management process also develops the modules for capturing, bridging,
and accessing the metadata. Metadata is created by several data warehouse
components, such as data acquisition, database design, and data access. Each
component, particularly if supported by a tool, has its own metadata storage facility
and access capabilities, therefore the disparate metadata must be linked using bridging
capabilities to ensure consistency and to facilitate access by the appropriate personnel.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-43
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Data Access
.....................................................................................................................................................
4-44 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Data Access
The Data Access process focuses on identifying, selecting, and designing tools to
support user access to data. A strategy is established and the user requirements are
defined as a framework for the data access environment.
Tools are evaluated, tested, and recommended.
User profiles are defined based on the level of data required to support their analysis,
decision-making requirements, and skill level. Detailed requirements are also
collected for the user interface style and for queries and reports.
With the user profiles, functional requirements, and levels of data to be accessed, tool
criteria are established for each data access component. In most cases, data access is
supported by a variety of tools rather than one tool to support everyone.
After tools are selected and installed, the data access objects are designed and
developed, including canned queries and reports, catalogs, metadata retrieval,
hierarchies, dimensions, user layer schemas, and user interfaces.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-45
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Documentation
Testing
.....................................................................................................................................................
4-46 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Documentation
The Documentation process focuses on producing all user and technical
documentation for the data warehouse, including references, user and system
operations guides, and online help.
To ensure active and successful use of the warehouse, the metadata reference guide
describes the contents of the data warehouse in business terms and provides a
navigational road map to the contents of the data warehouse.
In addition, the warehouse management documentation outlines the workflow and
manual and automated management procedures.
The new features guide highlights any enhancements to warehouse functionality that
result from the implementation of the solution.
Testing
The Testing process is an integrated approach to testing the quality of all components
of the data warehouse. The testing strategy is developed and approved before the test
system is created. System integration and module test plans, test scripts, and test
scenarios are developed. Each test is performed and proven. Testing includes proving
the physical design of the database.
Data acquisition modules, data access tools, and canned queries and reports also
undergo thorough module and integration testing. The testing strategy addresses all
components of the solution, including the ad hoc access processes.
Regression testing is performed, testing changes to the data warehouse against a
baseline, to ensure past functionality works when an enhancement is added.
Volume testing is conducted on the production platform to ensure that performance
meets established objectives.
Preparation of the acceptance environment and support for acceptance testing are also
performed during the Testing process.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-47
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Training
• Define requirements:
– Technical
– End user
– Business
• Identify staff to be trained
• Establish time frames
• Design and develop materials
• Focus on tool training and use of
the warehouse
Transition
.....................................................................................................................................................
4-48 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Training
The Training process defines the development and user training requirements,
identifies the technical and business personnel requiring training, and establishes time
frames for executing the training plans.
Training plans and training materials are designed and developed. User and technical
training is conducted.
The key objective is to provide both users and administrators with adequate training to
take on the tasks of operating, maintaining and using the data warehouse solution.
Training should focus on tool training and how business value is generated from the
information in the data warehouse.
Transition
The Transition process focuses on tasks to perform to transition to the production data
warehouse, and includes tasks to create the installation plan and prepare the
maintenance and production environments. During this process, the warehouse
management workflow is implemented and the production data warehouse is
available.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-49
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Post-Implementation Support
.....................................................................................................................................................
4-50 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Post-Implementation Support
The Post-Implementation Support process provides an opportunity to evaluate and
review the solution. You evaluate use of the warehouse by accessing metadata and
evaluating queries and reports run against the warehouse. The information assists with
management of standard queries and reports, and the user layer, and identifies required
indexes.
The process also focuses on refreshing the warehouse, monitoring and responding to
system problems, correcting errors, and conducting performance and tuning activities
for all components of the data warehouse. Other actions at this time include:
• Change control for information requirements
• Roll out of metadata, queries, reports, filters, and conditions
• Library of shared objects
• Security
• Incorporation of new users
• Distribution of data marts and catalogs
During this process, responsibility for the data warehouse may be transferred from
information system (IS) staff to the owning organization.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-51
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
.....................................................................................................................................................
4-52 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 4-53
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Roles
.....................................................................................................................................................
4-54 Data Warehousing Fundamentals
DWM Fundamental Elements
.....................................................................................................................................................
Roles
A warehouse project is complex in many ways especially the project team. The DWM
identifies the roles required and the main responsibilities of each role.
It identifies roles that are common within technology departments, such as:
• Development database administrator, who works closely with the system
administrator
• Lead tester, who oversees the test script planning, development, and execution
activities
• Production database administrator, who installs and configures the production
database and maintains database access controls
It identifies roles that are unique to data warehouse projects, for example:
• Data warehouse administrator: The data warehouse administrator is responsible for
warehouse management, maintenance, and the total data warehouse production
environment.
• Data warehouse architect: The data warehouse architect establishes the strategic
data warehouse architecture and manages the integration of the developed
increments with the wider data warehouse architecture.
• Data warehouse database designer: The data warehouse database designer is
responsible for producing the logical and physical database designs for the data
warehouse and data mart and for metadata objects.
Within this element of the method, other roles are identified.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-55
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
• Customer driven
– Warehouse products only
– Quality, not quantity
– High-value partnerships
• Requires
– Oracle certified solution partner
level
– Product certification
– References
.....................................................................................................................................................
4-56 Data Warehousing Fundamentals
Oracle Warehouse Technology Initiative (WTI)
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 4-57
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
.....................................................................................................................................................
4-58 Data Warehousing Fundamentals
Oracle Warehouse Technology Initiative (WTI)
.....................................................................................................................................................
Design and Administration Enables you to plan and design a data warehouse from
the ground up. These products help you identify and qualify the source data, lay out
the data structures, and define the mapping between data sources and the target data
warehouse.
Source WTI partners in this category to produce tools that help you build and
implement the data warehouse. IT professionals use these tools and utilities to extract,
transform, cleanse, and move data from source systems into the data warehouse or
data marts.
Access Enables you to view the contents of your data warehouse or data mart
database for analysis. Tools include report writers, query products, OLAP software,
executive information systems, and data mining. The products embrace a broad range
of architectures—from server-only to client-server to Web-based servers.
Data Content Provider This category includes any enterprise that sells or rents data
sets suitable for data warehousing. The data can range from market-share information
to demographics to financial-time services.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-59
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Summary
.....................................................................................................................................................
4-60 Data Warehousing Fundamentals
Summary
.....................................................................................................................................................
Summary
This lesson discussed the following topics:
• Explaining the different approaches to warehouse development and the benefits of
an incremental approach
• Identifying the purpose of the Oracle Method
• Discussing the purpose and fundamental elements of Data Warehouse Method
• Discussing the objectives of the Oracle Warehouse Technology Initiative
.....................................................................................................................................................
Data Warehousing Fundamentals 4-61
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
.....................................................................................................................................................
4-62 Data Warehousing Fundamentals
Practice 4-1
.....................................................................................................................................................
Practice 4-1
Exercise Background
Task You and a team of two or three other people are about to embark on Phase I of
a data warehouse project, that is determining the business requirements. This task
involves interviewing executives in your company to define the purpose, goals, and
strategies of the data warehouse.
In this exercise, you are going to form small groups and role-play the interviewing
session with your teammates. Do the following:
• Read through this worksheet. (5 mins)
• Form into groups of four and role play the interviewing session with your
teammates. Each of you will be assuming a role such as the DW team manager, the
chief financial officer (CFO), the chief operating officer (COO), or the information
technology (IT) director. Use the interview questions and the background about
each character to help you in this exercise. (15 mins)
• Regroup and in the class discussion answer the questions. Give your feedback
based on your observation. (20 mins)
Mission Statement “We exist to create value for our share owners on a long-term
basis by building a business that enhances Krispan’s trademarks. We do this by
maintaining our market leading status developing superior soft drinks, both carbonated
and noncarbonated, and profitable nonalcoholic beverage system, financial analysis,
and distribution services using empowered team dynamics in a total quality
paradigm.”
.....................................................................................................................................................
Data Warehousing Fundamentals 4-63
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
Role 1: Data Warehouse Team Manager He is the data warehouse team manager
for the Data Warehousing Implementation team. He is going to interview the
following key people using the interview questions on the next page.
• The chief financial officer (CFO) who is also the board-appointed project sponsor
this data warehouse implementation project.
• The chief operating officer (COO)
• The IT director (IT Director)
Role 2: CFO He was the board-appointed project sponsor and the person who has
been gaining a lot of profits from the company’s success. He does not want the new
systems because they will require a lot of change within his group. He is conservative
in his thinking and wants things to go on as before. He supports the company’s mission
statement but only so far as it meets his own agenda.
Role 3: COO She wants the system because she realizes the power of information,
believes that the data warehouse will give her real control in the company, and
acknowledges that the data warehouse will enable the company to be more
competitive in the marketplace. The COO has a good high-level understanding of what
she wants the system to provide her but she will need significant help in sorting out the
details. She understands the vision for the business and fully supports it.
Role 4: IT Director She does not understand the vision of the business but pretends
that she does by quoting it on a regular basis. She is very technical savvy but lacks the
business understanding of the organization. She wants power and influence, and
believes she can get both of these through the new infrastructure and big systems that
are planned.
.....................................................................................................................................................
4-64 Data Warehousing Fundamentals
Practice 4-1
.....................................................................................................................................................
Interview Questions
Ask the key persons the following questions.
Question to Ask CFO COO IT Director
1 What is the business vision?
Class Discussions
1 Identify the major challenges for a data warehousing implementation project, as
shown in this exercise.
2 Give your suggestions on how to overcome these challenges.
3 If you apply the Oracle Data Warehouse Method in the implementation to this
project, how would apply it and where do you see the benefits from using this
method?
.....................................................................................................................................................
Data Warehousing Fundamentals 4-65
Lesson 4: Driving Implementation Through a Methodology
.....................................................................................................................................................
.....................................................................................................................................................
4-66 Data Warehousing Fundamentals
5
.................................
Overview
Meeting a
Planning
Planning Business Modeling ETT Managing
for
for aa Need the Data (Building the the Data
Successful Warehouse Warehouse) Warehouse
Successful
Warehouse
Warehouse
Analyzing Supporting
User Query End User
Needs Access
Project Management
(Methodology, Maintaining Metadata)
Objectives
.....................................................................................................................................................
5-2 Data Warehousing Fundamentals
Overview
.....................................................................................................................................................
Overview
The previous lesson introduced the importance of driving a warehouse project by a
methodology.
This lesson introduces the planning that is critical to the success of a data warehouse
project. Planning phases, deliverables, and project roles are identified. Overall
warehouse strategy and project scope are defined.
Note that the “Planning for a Successful Warehouse” block is highlighted in the
overview slide on the facing page.
Objectives
After completing this lesson, you should be able to do the following:
• Explain the financial issues that must be managed in developing and implementing
a data warehouse.
• Outline techniques for obtaining business commitment to the warehouse.
• Outline the key tasks involved in managing a warehouse project
• Identify the major warehouse planning phases and their deliverables
• List warehouse strategy phase deliverables
• List warehouse scope phase deliverables
.....................................................................................................................................................
Data Warehousing Fundamentals 5-3
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
Financial Justification
.....................................................................................................................................................
5-4 Data Warehousing Fundamentals
Managing Financial Issues
.....................................................................................................................................................
Return on Investment The financial justification must set out a strong case that
clearly establishes measurements such as cost versus return on investment, and
increased efficiency and profit. It must also set clearly defined objectives that can be
monitored and measured.
Associated Costs Along with cost justification, you should provide a plan that
specifies other factors that will impact the cost of the project and other aspects of the
business.
• The cost of developing ETT or purchasing the ETT tools
• The actual time required for data cleansing, transformation, and extraction, which
may impact day-to-day operations
• Storage requirements for extract, summarization, work space, log space, backup,
recovery, and maintenance
• The cost of redundant data
• Hardware and software costs
• The cost of server and system software licenses
• Labor costs
You may regard this as a negative approach because some of these issues have a bad
impact on the business. However, given the enormous size of a data warehouse
project, every issue, good or bad, must be clearly understood and appreciated.
.....................................................................................................................................................
Data Warehousing Fundamentals 5-5
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
5-6 Data Warehousing Fundamentals
Managing Financial Issues
.....................................................................................................................................................
Funding
Initially, the information technology group may fund the project up until the pilot run
of the first increment. After the pilot, when the process is proven, funding usually
passes to the individual departments, particularly if the implementation is a
departmentalized data mart.
Debates often arise between information systems and individual departments about
who should pay for resources, such as the hardware and software, system (warehouse)
monitoring tools, and OLAP tools. Individual departments often express concern that,
if they fund tools in the development of one of the first subject areas that will be used
for warehouse initiatives, they should be able to recoup part of the investment from
other departments who build subject areas and benefit from those tools at a later time.
If the information systems department funds the tools, they absorb the cost or can bill
back to individual departments as required, over the depreciation life of the tools. In
the case of specific data marts (for departments), the cost is often the responsibility of
the local department.
Some warehouses do not charge for the first few months, usually while the project is
being funded by information systems development groups. Once the warehouse is
piloted and has proved successful, then charges are normally levied.
Charge Models
There are different models that you may use; none of them are completely fair. There
are no chargeback models strictly for the warehouse environment and the best model
may be a hybrid, specifically developed in house for the purpose.
Chargeback Benefits
• Encourages efficient and sensible use of resources
• Promotes realistic ongoing additional requirement requests
• Allows users to share the cost for the data warehouse processing and maintenance
Chargeback Drawbacks
• Users cannot dwell on detail, knowing they are being charged for the service.
• Users may not be motivated to discover more, anticipating that costs may run too
high.
• Machine resources are needed to monitor and maintain a charging system.
The business value of tangible, measurable results, in most cases, far outweighs the
overhead costs. Even if chargeback strategies are not deployed, the information
systems team still need to monitor warehouse use and can use those metrics to justify
future direction.
.....................................................................................................................................................
Data Warehousing Fundamentals 5-7
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
5-8 Data Warehousing Fundamentals
Obtaining Business Commitment
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 5-9
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
Steering Committee
• Business
executives
• Information
systems
representatives
• Knowledge
workers
• Provides direction
• Decides upon implementation issues
• Sets priorities
• Assists with resource allocation
• Communicates to all levels at all times
Copyright Oracle Corporation, 1999. All rights reserved.
.....................................................................................................................................................
5-10 Data Warehousing Fundamentals
Obtaining Business Commitment
.....................................................................................................................................................
Steering Committee
The steering committee should comprise representatives of different sectors within the
business:
• Business executives
• Information systems representatives
• Users
The aim of the committee is to:
• Provide business direction.
• Decide upon enterprisewide implementation issues.
• Determine and set development priorities.
• Assist with resource allocation.
• Communicate consistently to all areas and levels of the organization.
Each subject area may have its own detailed project plan, which can be rolled up to a
master plan weekly or monthly. The steering committee must be aware of how
changes to business direction and priorities affect existing project plans, milestones,
and deliverables. They must approach the renegotiation of existing plans tactfully and
diplomatically.
Note: The steering committee is not a substitute for the project manager.
.....................................................................................................................................................
Data Warehousing Fundamentals 5-11
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
5-12 Data Warehousing Fundamentals
Obtaining Business Commitment
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 5-13
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
5-14 Data Warehousing Fundamentals
Managing a Warehouse Project
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 5-15
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
5-16 Data Warehousing Fundamentals
Managing a Warehouse Project
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 5-17
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
Setting Expectations
Incremental
Scope
Rollout over time
Phases
.....................................................................................................................................................
5-18 Data Warehousing Fundamentals
Managing a Warehouse Project
.....................................................................................................................................................
Setting Expectations
Expectations for each data warehouse project phase should be established early on.
Every organization has heard something about data warehousing, data marts, data
mining, and on and on. To set the expectations throughout the organization you first
need to determine what each member of the organization is expecting from the data
warehouse.
Set Expectations for the Incremental Approach Educate all members of the
organization in advance that the data warehouse project will be incrementally
developed. Explain that there is no formal implementation of the entire data
warehouse all at once. Help the user community to understand that the data warehouse
provides views of the business over time and under continually changing strategic
environments.
.....................................................................................................................................................
Data Warehousing Fundamentals 5-19
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
Managing Expectations
• Documenting
• Informing sponsors
• Reporting progress to end users
.....................................................................................................................................................
5-20 Data Warehousing Fundamentals
Managing a Warehouse Project
.....................................................................................................................................................
Managing Expectations
Keeping Sponsors Informed Keep the executive sponsor of the warehouse, as well
as the end-user community, abreast of the iterative development that is taking place
during each phase.
Reporting Incremental Progress to End Users Highlight all new progress and
functionality to inform the user community of the incremental advances that are being
made to increase the amount of information that can be gained from the data
warehouse.
.....................................................................................................................................................
Data Warehousing Fundamentals 5-21
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
5-22 Data Warehousing Fundamentals
Managing a Warehouse Project
.....................................................................................................................................................
Architect
• Designs and documents data warehouse architecture and technical infrastructure
• On a small data warehouse project, may also be responsible for integrating all
networking products and host connectivity
Executive Sponsor
• Provides clout; influences resource availability, funding, and scheduling
• Provides understanding of the organization and its business
Data Analyst
• Is responsible for the data model and schema design
• Manages data quality, data integration, aggregation, and updates
• On a small data warehouse project, may also be involved in data extraction and
transformation
• On a large data warehouse project, may also be involved in exploring end-user
data requirements and deploying business intelligence and analysis tools
.....................................................................................................................................................
Data Warehousing Fundamentals 5-23
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
Percentage of
Project Effort A B C D E F Total
Requirements 3.2 .25 .79 4.1%
definition
Data acquisition .74 .23 1.36 6.69 6.26 .85 16.1%
Architecture 1 .59 .84 2.22 5.28 9.9%
Data quality .2 .32 .39 3.22 .2 4.3%
Administration .3 .12 .23 4.51 5.84 11.0%
. . .
.....................................................................................................................................................
5-24 Data Warehousing Fundamentals
Managing a Warehouse Project
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 5-25
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
5-26 Data Warehousing Fundamentals
Managing a Warehouse Project
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 5-27
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
Scope
Analysis
Design
Build
Production
Copyright Oracle Corporation, 1999. All rights reserved.
.....................................................................................................................................................
5-28 Data Warehousing Fundamentals
Identifying Planning Phases
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 5-29
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
5-30 Data Warehousing Fundamentals
Identifying Warehouse Strategy Phase Deliverables
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 5-31
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
Training strategy
Production
.....................................................................................................................................................
5-32 Data Warehousing Fundamentals
Identifying Warehouse Strategy Phase Deliverables
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 5-33
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
Production
.....................................................................................................................................................
5-34 Data Warehousing Fundamentals
Identifying Project Scope Phase Deliverables
.....................................................................................................................................................
Break the Project Down into Manageable Phases One challenge in defining
manageable phases is dealing with numerous tasks coupled to numerous
interdependencies, all occurring within a short time frame. Breaking this complexity
down into manageable pieces works toward the success of the project.
Involve End Users Iterative development works only when users are active
participants on the delivery team. In a data warehouse project there should be no
technical decisions, only business decisions. Business requirements drive all technical
decisions.
.....................................................................................................................................................
Data Warehousing Fundamentals 5-35
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
Production
.....................................................................................................................................................
5-36 Data Warehousing Fundamentals
Identifying Project Scope Phase Deliverables
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 5-37
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
Warehouse administration
Scope
plan
Metadata integration
Analysis plan
Data access plan
Design
Training plan
Build
Production
.....................................................................................................................................................
5-38 Data Warehousing Fundamentals
Identifying Project Scope Phase Deliverables
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 5-39
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
Summary
.....................................................................................................................................................
5-40 Data Warehousing Fundamentals
Summary
.....................................................................................................................................................
Summary
This lesson discussed the following topics:
• Cultivating management support, both financial and political, for the warehouse
• Developing a realistic scope that produces deliverables in short time frames to help
ensure success
• Assessing your organization’s readiness for a data warehouse
• Setting realistic expectations
.....................................................................................................................................................
Data Warehousing Fundamentals 5-41
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
5-42 Data Warehousing Fundamentals
Practice 5-1
.....................................................................................................................................................
Practice 5-1
Warehouse Organizational Readiness Checklist
1 For each item in the following list that measures warehouse readiness, rate your
own organization’s readiness. Rate each item’s relative importance in measuring
your organization’s readiness.
Readiness Measure Your Organization’s Readiness
Are the objectives and business drivers clearly
defined, compelling, and agreed upon?
Have you selected a methodology for design,
development, and implementation?
Is the project scope clearly defined, with a
focus on business rather than technology?
Is there strong support from a business
management sponsor?
Does the business management sponsor have
specific expectations?
Are there cooperative relations between
business and Information Systems staff?
Have you identified which source data will be
used to populate the data warehouse?
What is the quality and “cleanliness” of the
source data?
Are you authorized to choose and acquire
hardware and software to implement the
warehouse?
Are you prepared to select and train your
implementation team?
.....................................................................................................................................................
Data Warehousing Fundamentals 5-43
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
5-44 Data Warehousing Fundamentals
Practice 5-1
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 5-45
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
5-46 Data Warehousing Fundamentals
Practice 5-1
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 5-47
Lesson 5: Planning for a Successful Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
5-48 Data Warehousing Fundamentals
6
.................................
Overview
Planning Meeting a
Business Modeling ETT Managing
for a
Need the Data (Building the the Data
Successful
Warehouse Warehouse) Warehouse
Warehouse
Analyzing
Analyzing
Supporting
User
User Query
Query End User
Needs
Needs Access
Project Management
(Methodology, Maintaining Metadata)
Objectives
.....................................................................................................................................................
6-2 Data Warehousing Fundamentals
Overview
.....................................................................................................................................................
Overview
The previous lesson covered planning for a successful warehouse. This lesson
discusses analyzing user query needs. Note that the “Analyzing User Query Needs”
block is highlighted in the course road map on the facing page.
Specifically, this lesson identifies the analysis required to identify and categorize users
who may need to access data from the warehouse. This lesson also helps you
determine how their requirements differ. Data access and reporting tools are
considered.
Objectives
After completing this lesson, you should be able to do the following:
• Identify the warehouse users
• Identify how to gather user requirements
• Identify tasks involved with managing query access
• Identify the different database models that support OLAP query tools
• Describe query access architectures
.....................................................................................................................................................
Data Warehousing Fundamentals 6-3
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
Types of Users
• Executives
• Managers
• Business analysts
User Access
Types of Users
• Executives
• Casual users
or managers
• Business
analysts or
power users
Structured Unstructured
.....................................................................................................................................................
6-4 Data Warehousing Fundamentals
Types of Users
.....................................................................................................................................................
Types of Users
In any warehouse environment, the user communities and their query requirements
vary according to their roles and responsibilities.
Types of Users Definition Requirements
Executives They are in charge of the business and • They may interface to the
have overall responsibility for warehouse only through
controlling the business at an printed reports although
enterprise level, determining these users will experience
profitability, competitiveness, and the power of the data
strategy. They need to see bottom-line warehouse as the reports
figures. become more accurate,
consistent, and easier to
produce.
• Their needs drive the
development of the
applications, the
architecture of the
warehouse, the data it
contains, and the priorities
for implementation.
Casual users or They are in charge of a smaller • They need easy-to-use tool
managers component of the business and need that helps them specify
the information to control the what they want to see and
profitability, direction, planning, and determine how to produce
control of a smaller subset of the the desired results on its
business. They also need to see the own.
enterprisewide picture in order to fit • The tool must allow
localized plans into the corporate construction of all the
goal. reporting elements without
being too complicated.
• A single interface and
invisible multipass SQL are
critical.
Business They have a solid understanding of • They need a tool that
analysts or the business process and also have a reflects the way they would
power users technical understanding of break down and solve the
dimensional modeling and SQL, business problem.
which are required to extract the • The tool should handle
answers to business questions from reporting elements such as
the data warehouse and produce the ranking and comparison
reports needed by the managers and across summary levels.
executives. They often function as a
liaison between business and
technical groups.
.....................................................................................................................................................
Data Warehousing Fundamentals 6-5
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
• Areas to focus:
• How users do business and what the business
drivers are
• What attributes users need (required versus good
to have)
• What the business hierarchies are
• What data users use and what they like to have
• What levels of detail or summary needed
• What type of front-end data access tool used
• How users expect to see the query results
.....................................................................................................................................................
6-6 Data Warehousing Fundamentals
Gathering User Requirements
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 6-7
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
• Simple reports
• Complex trend analysis
• Regression analysis
• Multidimensional data analysis
• Exceptions reporting
• Forecasting
• Data manipulation
• Data mining
• Parameterized reports for batch execution
• Web-based or client-server-based (or both)
®
.....................................................................................................................................................
6-8 Data Warehousing Fundamentals
Managing User Data Access
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 6-9
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
• Starts simple
• Becomes more analytical
• Requires different techniques Why?
and flexible tools
What? Why?
Why?
.....................................................................................................................................................
6-10 Data Warehousing Fundamentals
Managing User Data Access
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 6-11
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
Training
• Methods
– Informal: one-to-one or small class
ILT
– Formal: larger class
– Self-study
• Basic topics
– Logging on
– Accessing metadata IDL
– Creating and submitting a query
– Interpreting results
– Saving queries and storing results
– Utilizing resources
– Learning warehouse fundamentals CBT
®
.....................................................................................................................................................
6-12 Data Warehousing Fundamentals
Managing User Data Access
.....................................................................................................................................................
Training Methods Users must be trained in using the system you have put in place.
There are a number of ways of teaching. The common methods are:
• Informal sessions with a small number of users who can disseminate the
information after the class (Typically the sessions are on a one-to-one basis, as
there are few real users of the warehouse initially.)
• Formal sessions in a classroom environment with larger numbers of students
• Self-study using interactive video, computer based training (CBT), or reference
manuals
Fundamental Training Topics The basic training should include some of the
following fundamental topics:
• How to switch on the hardware and log on to the data warehouse
• How to find out what data is there (access the metadata) and interpret its meaning
• How to create and issue a query
• How to prioritize queries
• How to monitor query execution
• How to interpret query results
• How to save the query and store results
• To have a basic understanding about the resources that are used within the query
environment, particularly in the environment where query governors are used (as
in a warehouse)
• How the warehouse works:
– Where the data comes from
– The level of data quality and integrity (or lack of it)
– What mapping is and how it is important
– Backup and recovery responsibilities (if any)
– Data and query availability
– Scheduled downtime
.....................................................................................................................................................
Data Warehousing Fundamentals 6-13
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
Query Efficiency
User considerations
• Successful completion
• Faster query execution
• Less CPU used
• More opportunity for further analysis
Query Efficiency
Designer considerations
• Use indexes
• Select minimum data
• Employ resource governors
• Minimize bottlenecks
• Develop metrics
• Use prepared and tested queries
• Use quiet periods
.....................................................................................................................................................
6-14 Data Warehousing Fundamentals
Managing User Data Access
.....................................................................................................................................................
Query Efficiency
Designer’s Role Efficient query access is dependent on the good design of the data
warehouse. The following points are important to ensure query efficiency:
• Create indexes on key values to minimize full-table scans.
• Select only the minimum amount of data required.
• Administer resource governors on the server to:
– Prevent access
– Cut off a query after it has run for a specified time
– Inform the user how long a query will take (Resource governors may be set for
the entire application or by user group. Governors are vital where data volumes
are very large.)
• Minimize intensive I/O bottlenecks.
• Develop metrics to support queries.
• Make more use of prepared and tested queries.
• Submit large jobs out of working hours, or when CPU usage, network, and I/O
contention is minimal.
Note: Database resource manager in Oracle8i provides you with the ability to control
and limit the total amount of processing resources available to a given user or set of
users. Using this facility, you will be able to:
• guarantee certain users a minimum amount of processing resources regardless of
the load of the system and the number of users.
• distribute available processing resources by allocating percentages of CPU time to
different users and applications.
• limit the degree of parallelism that a set of users can use.
• configure an instance to use a particular method of allocating resources.
• select the priority from a given set of priorities that the DBA has assigned to the
user.
.....................................................................................................................................................
Data Warehousing Fundamentals 6-15
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
Charge Models
.....................................................................................................................................................
6-16 Data Warehousing Fundamentals
Managing User Data Access
.....................................................................................................................................................
Charge Models There are a number of different models that may be used to charge
for services. Some of the examples are:
• Flat allocation model: The cost is allocated by a central group (Financial
Controller) based on the percentage of resources used by the organization, such as
office space, number of users, and budgets.
• Transaction based model: The cost is based on query usage, which may mean
calculations based on CPU use, I/O, data, or table elements accessed and reported.
• Telephone service model: The cost is based on connection time.
• Cable TV model: The cost is based on simple standard service charges plus
charges for special services.
Some of these models may not apply to your installation; you may consider
developing a unique model based on your own unique requirements.
Note: Whatever model you employ should balance the needs of the users to access the
data they need against the cost of that data, without discouraging use.
.....................................................................................................................................................
Data Warehousing Fundamentals 6-17
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
• Query scheduling
– Manages information usage
– Directs queries
– Executes queries
– Sets job queue priorities
• Query monitoring
– Track resource-intensive queries
– Detect unused queries
– Catch queries that use summary data
inefficiently
– Catch queries that perform regular summary
calculations at the time of query execution
– Detect illegal access ®
.....................................................................................................................................................
6-18 Data Warehousing Fundamentals
Managing User Data Access
.....................................................................................................................................................
Managing Queries
Query Scheduling Once the warehouse is operational, queries are submitted to the
warehouse server. You need to create a process that:
• Manages the use of information in the data warehouse
• Directs queries to the appropriate data source, using metadata
• Schedules the execution of a query
• Sets job queue priorities
Query Monitoring You need to keep a check on warehouse query activity. The
query management program (or tool) must:
• Track resource-intensive queries, which require analysis to identify why they are
so resource-intensive, followed by tuning to improve performance.
• Detect queries that are never used and remove them. Do not forget to ensure that
the users need to be advised of this kind of change.
• Catch queries that use summary data inefficiently; the summary strategy may need
revision.
• Catch queries that perform regular summary calculations at the time of query
execution. You may decide to include another summary table in the data
warehouse with the presummarized data to provide immediate access, which
improves overall speed of access.
• Detect illegal access. A user may need access to currently denied data.
Query Management and Monitoring Tools For scheduling you can use custom in-
house developed programs, a UNIX scheduler, third-party tools, or Oracle Enterprise
Manager.
For monitoring you may use the DSS tools themselves (where they have the
capability), in-house developed tools, and server management products such as Oracle
Enterprise Manager.
Consider the automation levels, technology interfaces, and cost of the query
management and monitoring tools before purchasing them.
.....................................................................................................................................................
Data Warehousing Fundamentals 6-19
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
Security
• Do not overlook
• Subject area sponsors:
– Review and authorize
request for access
rights
– Identify enhancements
• Transparent security
• Easy to implement,
maintain, and manage
Security Plan
• Define a strategy:
– Allocate business area owners
– Ensure invisibility
• Ensure easy management
• Consider auditing
• Manage passwords
.....................................................................................................................................................
6-20 Data Warehousing Fundamentals
Security
.....................................................................................................................................................
Security
Security is commonly controlled by the database administrator (DBA). It must be
considered early in the development to ensure that access to the key resource
information is controlled. Information is a key company resource that needs
protection. Therefore never assume that you can overlook security because user access
is query-only. There are some simple guidelines on security that you can follow:
• Ensure that each subject area has a sponsor who can carry out the following tasks:
– Review and authorize requests for access rights
– Identify further enhancements to the security setup (Data may be separated
into that which is accessible to all users and that which is accessible to a select
few.)
• Ensure that the security is transparent and does not impair access from the user
perspective
• Ensure that the strategy is easy for you to implement, maintain, and manage
Security Plan
• Allocate an owner to every business area within the warehouse. The owner should
be able to advise what access any requestor should be given and define the data
that can be made available publicly, compared with data that must be restricted.
• Ensure that the security levels are virtually invisible to the users.
• Ensure that you can manage and administer the security simply and define a clear,
simple strategy for:
– Access requests
– Allocating predefined roles, both public and restricted, to subject areas
– Auditing to identify unauthorized access attempts
– Password management
.....................................................................................................................................................
Data Warehousing Fundamentals 6-21
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
Role-Based Security
Who am I?
Where am I?
Table
Access Application
policy context
®
.....................................................................................................................................................
6-22 Data Warehousing Fundamentals
Security
.....................................................................................................................................................
Role-Based Security
You should use the usual technique of database roles that you can use in an operational
environment. However, you need to consider implementing role-based security
somewhat differently, because of the differences in the way the warehouse and
operational systems work.
For example, you should set up roles that do the following jobs:
• Provide users with access to specific subject areas
• Provide users with access by department
• Limit access to specific objects within any subject area
• Control access when loading data (You need a role to REVOKE and a role to
GRANT if you are using Oracle databases.)
.....................................................................................................................................................
Data Warehousing Fundamentals 6-23
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
.....................................................................................................................................................
6-24 Data Warehousing Fundamentals
OLAP
.....................................................................................................................................................
OLAP
The term online analytical processing (OLAP) was coined by Dr. E. F. Codd to
describe a technology that could bridge the gap between personal computing and
enterprise data management. Decision support systems (DSS) are systems that enable
decision makers in organizations to access data relevant to the decisions they are
required to make. The definitions of OLAP and DSS are often confused with each
other.
OLAP Online analytical processing covers a wide spectrum of usage and a wide
variety of requirements. Online analytical processing has a number of different
definitions, such as a loosely defined set of principles that provide a dimensional
framework for decision support. Essentially OLAP is a flexible analytical tool that is
commonly used to analyze and interpret data in a data warehouse or data mart.
DSS Decision support systems are not new. They have been around for many years.
In an earlier lesson, you saw that decision support systems were provided with
information obtained from data extract processing.
DSS, therefore, provide users with data, enabling decision making. They may or may
not be a data warehouse or data mart. They may have an operational environment or an
operational environment with data extracts used for specific decision making
activities.
There is little distinction between decision support and online analytical processing.
Online analytical processing tools provide a decision support capability. Both online
analytical processing and decision support query and reporting tools provide the
means for informed decision making.
The Functionality of OLAP OLAP provides much more than just the ability to
perform rotating or drilling down. It offers the ability to create and examine calculated
data interactively on large volumes of data, the ability to determine comparative or
relative differences, as well as the ability to perform exception and trend analysis on
calculated data. Some of the advanced analytical functions of OLAP are forecasting,
modeling, regression analysis, and solving simultaneous equations.
Note: OLAP and DSS are also referred to as EIS (executive information systems) or
KBS (knowledge based systems).
.....................................................................................................................................................
Data Warehousing Fundamentals 6-25
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
6. Generic dimensionality
7. Dynamic sparse matrix handling
8. Multiuser support
9. Unrestricted cross-dimensional operations
10. Intuitive data manipulation
11. Flexible reporting
12. Unlimited dimensions and aggregation levels
.....................................................................................................................................................
6-26 Data Warehousing Fundamentals
OLAP
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 6-27
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
Customer Store
Store
Time Time
SALES FINANCE
Product GL_Line
.....................................................................................................................................................
6-28 Data Warehousing Fundamentals
OLAP
.....................................................................................................................................................
The Multidimensional Database Model You can visualize the data model for a
multidimensional database as a cube (the equivalent of a table in a relational database).
Each cube has several dimensions (equivalent to index fields in relational tables).
The cube acts like an array in a conventional programming language. Logically, the
space for the entire cube is preallocated. To find or insert data, you use the dimension
values to calculate the position.
For example, sales for Product P2, Store London, and Time Jan97 may be in position
[2,50,13]. In practice, a multidimensional product would have techniques to compress
the amount of disk space used.
In the diagram, the database contains two cubes. Sales is a four-dimensional cube of
information collected over time by store, product and customer. The Financial
information cube is three-dimensional, collected by time, store, and general ledger
account line. The store and time dimensions are common to the two cubes. Because
the database can contain many cubes, this approach is sometimes referred to as
multicube storage.
A cube can also be a formula rather than a variable. In this case the cube is stored as a
calculation formula such as Profit = Revenue – Expenses, and the data is calculated on
demand from the stored cubes for revenue and expenses. This is like a view in a
relational system.
The power of this model is the high degree of analysis it puts at your fingertips, when
combined with online analytical processing tools. Online analytical processing today
generally involves the use of a separate multidimensional server that contains a
relatively small amount of highly indexed data from operational systems.
.....................................................................................................................................................
Data Warehousing Fundamentals 6-29
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
Relational Server
• Benefits:
– Well-known environment with many experts in
most organizations able to support the product
– Can be used with data warehousing and
operational systems
– Many tools available with advanced features
including improvements made to performance
with report servers
• Disadvantages:
– Does not have any complex functions or
analysis capabilities provided by OLAP tools
– These products may also be restricted to the
volumes of data they can access
®
Multidimensional Server
• Benefits:
– Quick access to very large volumes of data
– Extensive and comprehensive libraries of
complex functions specifically for analysis
– Strong modeling and forecasting capabilities
– Can access multidimensional and relational
database structures
• Disadvantages:
– Difficulty of changing dimensions without
reaggregating to time
– Lack of support for very large volumes of data
®
.....................................................................................................................................................
6-30 Data Warehousing Fundamentals
OLAP
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 6-31
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
MOLAP Server
Warehouse
®
MOLAP Server
• Data
DSS client
– Arrays
– Cached
– Offloaded from server MOLAP
engine
• Efficient storage and processing
Application
• Complexity hidden from the user layer
• Analysis using preaggregated
summaries and precalculated
measures
Warehouse
®
.....................................................................................................................................................
6-32 Data Warehousing Fundamentals
OLAP
.....................................................................................................................................................
Characteristics
• Data is stored as a precalculated array.
• The data resides, or is cached, in a proprietary multidimensional database, with a
multidimensional viewer. Both the data and index values are held in arrays.
• The database is organized to allow rapid retrieval of related data across multiple
dimensions.
• Data can be offloaded from the server onto the client for local access, reducing
network traffic. However, it can take time to form the cubes.
• The MOLAP tools store and process multidimensional data efficiently.
• The calculation engine creates new information from existing data through
formulas and transformations.
• The complexity of the underlying data is transparent to the user.
• The tools can exploit the complexity of the analysis involved.
• The complex analytical querying capabilities enable a business to respond to
change faster.
• Preaggregated summary data and precalculated measures enable quick and easy
analysis of complex data relationships.
.....................................................................................................................................................
Data Warehousing Fundamentals 6-33
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
ROLAP Server
Warehouse
server
®
ROLAP Server
Warehouse
server
®
.....................................................................................................................................................
6-34 Data Warehousing Fundamentals
OLAP
.....................................................................................................................................................
Characteristics
• Data and metadata is stored as records in the relational database. The OLAP server
uses this metadata dynamically to generate the SQL statements necessary to
retrieve the data as the user requests it.
• Users see a multidimensional view of data that is stored in relational tables.
• End users are supplied with a multidimensional viewing tool to view the relational
data.
• There is high capacity connectivity to powerful servers.
• There are no limitations on the size of the database or the kind of analysis that may
be performed. However, if the server is SQL-driven, some engines may severely
affect performance if the user joins several tables or performs complex
computations.
• Complex SQL code is generated by the ROLAP tool. The tools create a number of
SQL statements when they access the database; this may adversely affect
performance.
.....................................................................................................................................................
Data Warehousing Fundamentals 6-35
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
? ?
.....................................................................................................................................................
6-36 Data Warehousing Fundamentals
OLAP
.....................................................................................................................................................
OLAP The key concept is the consistent theme in each of these configurations:
online analytical processing. OLAP tools and applications must be able to manipulate
and display data using a multidimensional view. The multidimensional data model is
specifically designed for this type of analysis, and reflects the way users think about
their businesses.
• Performance Versus Storage: The central issue surrounding this OLAP
configuration question is the trade-off between performance and storage space.
When data is stored in the multidimensional model (MOLAP), data-access
performance is maximized for the end user. However, some redundancy of storage
results, and multidimensional databases can become extremely large.
When data is stored only in the warehouse and is brought into the
multidimensional cache when queried (ROLAP), added storage is not an issue, but
query performance suffers.
• Flexible OLAP Access: A complete OLAP solution should provide any of these
options. Oracle Express technology is based on a multidimensional data model,
but the underlying data can be structured in a number of ways.
.....................................................................................................................................................
Data Warehousing Fundamentals 6-37
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
MOLAP
MDDB
Query
Periodic
load Data
Warehouse Express Express
Server user
ROLAP
Cache
Live
fetch Query
Data Data
cache
Express Express
Warehouse
Server user
.....................................................................................................................................................
6-38 Data Warehousing Fundamentals
OLAP
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 6-39
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
Hybrid (HOLAP)
MDDB and
cache
Periodic
load Query
Data
Fetch,
cache
Express Express
Warehouse
Server user
.....................................................................................................................................................
6-40 Data Warehousing Fundamentals
OLAP
.....................................................................................................................................................
HOLAP The MOLAP and ROLAP approaches can be combined into a hybrid
(HOLAP) solution, which takes advantage of the strengths of both the ROLAP and
MOLAP methods.
In the hybrid solution, the relational database is used to store the bulk of the detail
data, and the multidimensional model is used to store summary data.
.....................................................................................................................................................
Data Warehousing Fundamentals 6-41
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
Good
• Business needs
MOLAP
• User adaptability
Query
• GUI interface Performance
• Computer architecture ROLAP
• Network architecture OK
• Network throughput Simple Complex
• Openness Analysis
.....................................................................................................................................................
6-42 Data Warehousing Fundamentals
OLAP
.....................................................................................................................................................
Factors Influencing Query Tool Choice The diagram shows that ROLAP serves
the user who requires simple analysis and MOLAP serves the user who needs more
complex analysis, because of the performance and summarization benefits of MOLAP.
There are a number of key issues to consider when determining which product to use:
• Business need: Does the tool fit current and future reporting requirements?
Consider whether the tool is able to successfully access the data sources and
models needed to provide information required. Is the tool able to access the
volumes of data necessary to perform the analysis required?
• User: Some tools have a steep learning curve and are specialized in their
presentation. Is there room in your organization for yet another specialist tool?
Does the tool provide the flexibility, functionality, and speed needed?
• GUI: Consider how organized, intuitive, user-friendly, and robust the interface is.
• Computing architecture: Consider existing computer architectures. Decide
whether the fat client with its associated features and functionality could be
replaced by the thin client. Do the selected tools fit in with your current and
planned architecture?
• Network architecture: Consider how the products deploy their requests across the
network, and the effects on the network and server. Can the chosen network
(WAN, LAN, or MAN) support the analysis approaches chosen? Conversely, can
the tool fit within the network architecture defined?
• Network throughput: Is the network capable of the capacity? Is it likely to be
affected by access contention? What is your networking strategy? Do you have
one?
• Openness: Is the product portable and does it have the necessary application
program interface (API) to connect to the databases you have in place? Can you
write or customize APIs?
.....................................................................................................................................................
Data Warehousing Fundamentals 6-43
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
Good
• Performance
MOLAP
• Scalability
Query
• Management Performance
• Enterprisewide
ROLAP
perspective
OK
Simple Complex
Analysis
.....................................................................................................................................................
6-44 Data Warehousing Fundamentals
OLAP
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 6-45
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
Client-Server Access
Windows
Macintosh
OS/2
UNIX
®
Web Access
.....................................................................................................................................................
6-46 Data Warehousing Fundamentals
Query Access Architectures
.....................................................................................................................................................
Client-Server Access
The principle behind the client-server approach is to split the processing among
servers and localized processing on the client.
This openness among systems provides the configuration with total flexibility.
Different users may run different tools that access the data warehouse. They are:
• Simple query tools
• Complex analysis tools
• Data mining tools
Web Access
At this time data warehouse information is provided as Web-based applications on
intranets (networks within a company), as an alternative to other DSS delivery
mechanisms.
Internet and intranet access to a warehouse may bring these benefits:
• Lower hardware costs
• Lower communication costs
• Lower application licensing and maintenance costs
• Minimized burden on administrators
Internet Security Issues Security issues abound in this environment, and you must
carefully consider the impact of providing global access to your data. You should
consider:
• View-based security techniques, with a permissions table identifying users’
clearance codes. The codes themselves match to clearance codes held with the data
in the warehouse.
• Caching techniques that allow only queries available to users of a certain code to
actually access the cached data.
• Password abstraction, which allows you to specify for access a password that is
then converted behind the scenes, when access to the database is then made
available.
.....................................................................................................................................................
Data Warehousing Fundamentals 6-47
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
Fat Client
Thin Client
.....................................................................................................................................................
6-48 Data Warehousing Fundamentals
Query Access Architectures
.....................................................................................................................................................
Fat Client
In a client-server architecture, a fat client is a client that performs the bulk of the data
processing operations. The data itself is stored on the server.
During the 1980s, the industry introduced PCs (clients) with graphical interfaces and
high-end servers that can house databases. As these became more popular, companies
downsized, rightsized, and reduced mainframe computing architectures. Today, the PC
is the foundation of most modern enterprise systems, and gives many users the ability
to perform many tasks with ease.
PCs create some challenges, however:
• They have become “fat,” demanding more software and hardware.
• Administering multiple copies of software is difficult.
• Once developed, client software offers limited reusability in extending
applications.
• Users require a limited selection of the software available on the PC.
• PCs are costly to purchase and maintain in terms of the amount of software
required to support each device.
Thin Client
In client-server applications, a thin client is designed to be especially small so that the
bulk of the data processing occurs on the server. A thin client is a network computer
without a hard disk drive, whereas a fat client includes a disk drive.
Advances in Internet technology, decreases in the cost of high-end servers, and
increases in the total cost of purchasing, supporting, and maintaining PCs are
prompting IT departments to reconsider their client-server strategy. They are starting
to use the features of the Web to eliminate the reliance on PCs. To this end, the “thin”
client (a browser) is a device that contains the application logic, connected to the high-
end server.
Thin client access to a data warehouse across the Web has a number of advantages:
• Lower hardware cost per user
• Lower licensing costs per user (The software is centralized on the server.)
• Open deployment platform
Web access is still in its early years and has some challenges to face. It needs to:
• Evolve from a library of documents to an electronic business platform that can
conduct secure transactions on intranets and the Internet
• Provide rich levels of security, data integrity, and distributed transaction support
• Provide robust, scalable, and reusable extensibility
The network computer (NC), available from Oracle, is an example of a thin client.
.....................................................................................................................................................
Data Warehousing Fundamentals 6-49
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
Summary
.....................................................................................................................................................
6-50 Data Warehousing Fundamentals
Summary
.....................................................................................................................................................
Summary
The lesson discussed the following topics:
• The purpose of building a data warehouse is to enable users to access the
information in the warehouse
• Determining user query needs is an important part of the data warehouse project
implementation
• Planning for good data access capability is important to the success of the data
warehousing project
.....................................................................................................................................................
Data Warehousing Fundamentals 6-51
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
.....................................................................................................................................................
6-52 Data Warehousing Fundamentals
Practice 6-1
.....................................................................................................................................................
Practice 6-1
1 Complete the user profile column in this exercise with one of the following user
types:
– Executive
– Casual user or manager
– Business analyst or power user
Name Access Needs Technology User Profile
Brian O’Reilly • Need to develop simple • Microsoft Office
forecast, such as • Internet browser
budgets • Spreadsheets
• Ease of use is important
Mary Ramos • One click access • E-mail
• Only need highly • Microsoft Office
summarized • Internet browser
information
• Ease of use is very
important
Kim Seng • Constantly wants to • Spreadsheets
“get more data” • Oracle Reports
• Understands the • Oracle Discoverer
organization’s business • Oracle Express
processes Analyzer
Amber Salinas • Lots of drilling • Extensive SQL
• Customize graphical programming
user interface (GUI) • Oracle7X,
• Needs to know data Oracle8X Server
structures • Oracle Express
.....................................................................................................................................................
Data Warehousing Fundamentals 6-53
Lesson 6: Analyzing User Query Needs
.....................................................................................................................................................
3 Security Consideration Checklist exercise: Form into small groups, and discuss
each of the following questions. For each question, discuss briefly whether you
would use it in your own security consideration checklist back at your workplace,
and rate its importance relative to the other questions on the checklist.
Security Consideration Question Will You Use? Why?
a Security should be addressed at column
level (and in some cases at the row level),
at the table level, at the database level, at
the tools level, at the client and server
level, and at the network level.
b Create views to limit access to particular
columns or, in unusual circumstances,
rows.
c Do not rely on anything to protect the
database except the database security.
d How are reports upgraded when new
versions are released?
e Security should be implemented based on
what makes the most sense for both the
short-term and long-term health of the
business. Judge security not only by its
structure, but by how well it supports the
entire corporate organization’s needs and
survival.
.....................................................................................................................................................
6-54 Data Warehousing Fundamentals
7
.................................
Overview
Defining
DW Concepts Choosing a Planning
& Terminology Computing Warehouse
Architecture Storage
Analyzing
Supporting
User Query
End User
Needs Access
Project Management
(Methodology, Maintaining Metadata)
Objectives
.....................................................................................................................................................
7-2 Data Warehousing Fundamentals
Overview
.....................................................................................................................................................
Overview
This lesson examines the role of data modeling in a data warehousing environment.
The lesson presents a very high level overview of warehouse modeling steps. You
consider the different types of models that can be employed, such as the star schema.
Tools available for warehouse modeling are introduced.
Note that the “Modeling the Data Warehouse” block is highlighted in the overview
slide on the facing page.
Objectives
After completing this lesson, you should be able to do the following:
• List generic phases for modeling a data warehouse
• List the components of a warehouse data model
• Identify tools available for warehouse modeling
Note: Oracle offers a two-day, instructor-led course entitled Data Warehouse
Database Design. That course teaches comprehensive database design by using a case
study, whereas this lesson provides a high-level overview.
.....................................................................................................................................................
Data Warehousing Fundamentals 7-3
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
1
1. Defining the business model
(conceptual model) Select a
2. Creating the dimensional model business
(logical model) process
3. Modeling summaries
2, 3
4. Creating the physical model
Physical model
.....................................................................................................................................................
7-4 Data Warehousing Fundamentals
Data Warehouse Database Design Phases
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 7-5
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
7-6 Data Warehousing Fundamentals
Phase One: Defining the Business Model
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 7-7
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
Business
requirements
Other inputs
.....................................................................................................................................................
7-8 Data Warehousing Fundamentals
Phase One: Defining the Business Model
.....................................................................................................................................................
Primary Input The business requirements are the primary input to the design of the
data warehouse. Information requirements as defined by the business people—the end
users—will lay the foundation for the data warehouse content.
Other Inputs Overlaying those requirements with source information and further
research regarding how data is used helps to determine the specific data that the data
warehouse will provide. Other sources may be:
• Existing metadata
• Source ER diagrams from relational OLTP systems
• Research
• Legacy nonrelational systems data
.....................................................................................................................................................
Data Warehousing Fundamentals 7-9
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
Measures Dimensions
• Balance • Description
• Units Sold • Location
• Cost • Color
• Sales • Size
.....................................................................................................................................................
7-10 Data Warehousing Fundamentals
Phase One: Defining the Business Model
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 7-11
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
Determining Granularity
YEAR?
QUARTER?
MONTH?
WEEK?
DAY?
.....................................................................................................................................................
7-12 Data Warehousing Fundamentals
Phase One: Defining the Business Model
.....................................................................................................................................................
Determining Granularity
When gathering more specific information about measures and analytic parameters, it
is important also to understand the level of detail that is required for analysis and
business decisions. This level of detail is called granularity. The greater the level of
detail, the finer the level of granularity.
The Key Question What do your users really need for now and for the near-term
future? Determine that and then design for one grain finer.
Consider that users typically perform fine-grain analysis on a short horizon, maybe six
weeks. Thus, as a solution, you can retain six weeks of data online and roll off the aged
data automatically.
Note: Remember that you can always aggregate upward, but you cannot disaggregate
lower than the data that is stored in the data mart.
.....................................................................................................................................................
Data Warehousing Fundamentals 7-13
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
Location Product
Time Store
Month > Quarter > Year Store > District > Region
.....................................................................................................................................................
7-14 Data Warehousing Fundamentals
Phase One: Defining the Business Model
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 7-15
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
7-16 Data Warehousing Fundamentals
Phase Two: Creating the Dimensional Model
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 7-17
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
Dimension Tables
Product Channel
Facts
(units,
price)
Customer Time
Fact Tables
.....................................................................................................................................................
7-18 Data Warehousing Fundamentals
Phase Two: Creating the Dimensional Model
.....................................................................................................................................................
Dimension Tables
Dimensions are the textual descriptions of the business. Dimension tables are typically
smaller than fact tables and the data changes much less frequently. Dimension tables
give perspective regarding the whys and hows of the business and element
transactions.
While dimensions generally contain relatively static data, customer dimensions are
updated more frequently.
Dimensions Are Essential for Analysis The key to a powerful dimensional model
lies in the richness of the dimension attributes because they determine how facts can
be analyzed. Dimensions can be considered as the entry point into “fact space.”
Always name attributes in the users’ vocabulary. That way, the dimension will
document itself and its expressive power will be apparent.
Fact Tables
Facts are the numerical measures of the business. The fact table is the largest table in
the star schema and is composed of large volumes of data.
Although a star schema typically contains one fact table, other DSS schemas can
contain multiple fact tables.
Raw facts such as dollar sales can be combined or calculated with other facts to create
measures. Measures can be stored in the fact table or created when necessary for
reporting purposes.
.....................................................................................................................................................
Data Warehousing Fundamentals 7-19
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
Product Channel
Facts
(units,
price)
Customer Time
Dimension tables
.....................................................................................................................................................
7-20 Data Warehousing Fundamentals
Phase Two: Creating the Dimensional Model
.....................................................................................................................................................
Dimensional Model
Dimensional Model The dimensional model has a single fact table and one or more
lookup or dimension tables for analytical purposes.
Star Schema The star schema is the simplest form of a dimensional model.
The fact table contains foreign keys that reference primary keys in the dimension
tables.
.....................................................................................................................................................
Data Warehousing Fundamentals 7-21
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
7-22 Data Warehousing Fundamentals
Phase Two: Creating the Dimensional Model
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 7-23
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
7-24 Data Warehousing Fundamentals
Phase Two: Creating the Dimensional Model
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 7-25
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
7-26 Data Warehousing Fundamentals
Phase Two: Creating the Dimensional Model
.....................................................................................................................................................
Lightly and Highly Summarized Data Summary data falls into two loose
categories:
• Lightly summarized data is summarized from the incoming fact data and normally
stored over a unit of time. Please refer to the earlier discussion on granularity.
• Highly summarized data is more compact. It may be distilled from lightly
summarized data or introduced into the warehouse already in the highly compact
format.
.....................................................................................................................................................
Data Warehousing Fundamentals 7-27
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
• Average • Total
• Maximum • Percentage
Product A
Total
Product B
Total
Product C
Total
.....................................................................................................................................................
7-28 Data Warehousing Fundamentals
Phase Two: Creating the Dimensional Model
.....................................................................................................................................................
How Many Summaries? The issue with summary tables is not whether you are
going to have any, but how many you are going to have. Business users require
summary information. For example, a manager needs the bottom line figures that
show how well the company is performing. Analysis of the requirement is
instrumental in ensuring that the users get the information they need and that they get
it quickly. A warehouse may contain hundreds of summary tables.
.....................................................................................................................................................
Data Warehousing Fundamentals 7-29
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
7-30 Data Warehousing Fundamentals
Phase Two: Creating the Dimensional Model
.....................................................................................................................................................
Summary Table Management The requirement for summary tables may change
over time, as what constitutes a popular query changes. Queries may be seasonal, for
example, you may have specific queries for spring, summer, autumn, and winter. The
query management process should be able to identify the summaries that are used, the
summaries that need to be created, and the summaries that may be removed.
.....................................................................................................................................................
Data Warehousing Fundamentals 7-31
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
Summary Management
in Oracle8i
Sales Sales
summary
Region
State
City
Product Time
Summary advisor
Summary Space
usage Summary requirements
recommendations
Copyright Oracle Corporation, 1999. All rights reserved.
.....................................................................................................................................................
7-32 Data Warehousing Fundamentals
Phase Two: Creating the Dimensional Model
.....................................................................................................................................................
Summary Advisor
Oracle 8i summary advisor offers the following information:
• Summary usage: such as the number of times a rewrite was made to use a
summary, the space used by a summary, and a cost-benefit ratio for each summary.
• Summary recommendations: such as creation, retention and dropping of
summaries.
• Space requirements: based on queries for possible summaries.
Materialized Views
Summaries are stored in materialized views. While creating materialized views, you
can specify storage options to control the size and location of the views.
Query Rewrite
The Oracle8i cost-based optimizer may use a summary to satisfy a query on the base
table (SALES). The process of transforming a query to access materialized views,
such as the query using the SALES table in the example, is called a query rewrite.
If the SALES table consisted of several million rows and the materialized view
contains a few thousand rows, the query will execute very much faster. Query rewrite
is the key benefit enabled by materialized views.
.....................................................................................................................................................
Data Warehousing Fundamentals 7-33
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
Time
Sales fact
dimension
.....................................................................................................................................................
7-34 Data Warehousing Fundamentals
Phase Two: Creating the Dimensional Model
.....................................................................................................................................................
Storing the Time Dimension Typically there is a time dimension table in the data
warehouse although time elements may be stored on the fact table. Before deciding
where to store time, you must consider the following:
• Almost every data warehouse has a time dimension.
• Organizations use a variety of time periods for data analysis.
• A row whose key is an SQL date may be populated with additional time qualifiers
needed to perform business analysis, such as workday, fiscal period, and special
events.
.....................................................................................................................................................
Data Warehousing Fundamentals 7-35
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
7-36 Data Warehousing Fundamentals
Phase Two: Creating the Dimensional Model
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 7-37
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
7-38 Data Warehousing Fundamentals
Data Modeling Tools
.....................................................................................................................................................
These tools are also referred to as computer aided software engineering (CASE) tools.
Disregarding these tools, many warehouse implementers simply use spreadsheets or
paper and pencil to model their designs and document the metadata.
Note: Logic Works was acquired by Platinum, which in turn was acquired by
Computer Associates.
.....................................................................................................................................................
Data Warehousing Fundamentals 7-39
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
Summary
This lesson discussed the
following topics:
• Creating a business model Select among
• Creating a dimensional model business
processes
• Modeling the summaries
• Creating a physical model
Business model
Dimensional model
Physical model
.....................................................................................................................................................
7-40 Data Warehousing Fundamentals
Summary
.....................................................................................................................................................
Summary
In this lesson, you explored one process for modeling the warehouse database. This
lesson discussed the following topics:
• Creating a business model driven by business processes
• Creating a logical dimensional model containing a central fact characterized by
several dimensions
• Modeling the summaries needed for end-user analysis
• Translating the logical model to a physical model
Note: Oracle offers a two-day, instructor-led course entitled Data Warehouse
Database Design. That course uses a case study to teach comprehensive database
design, whereas this lesson provided a high-level overview.
.....................................................................................................................................................
Data Warehousing Fundamentals 7-41
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
7-42 Data Warehousing Fundamentals
Practice 7-1
.....................................................................................................................................................
Practice 7-1
1 Identify whether the following statements are true or false.
Question True False
The business model is a logical representation of selected
business processes.
The star model is normalized.
The snowflake model is denormalized.
All warehouses must have a time dimension.
In a warehouse environment, data loading performance is less
important than query performance.
.....................................................................................................................................................
Data Warehousing Fundamentals 7-43
Lesson 7: Modeling the Data Warehouse
.....................................................................................................................................................
.....................................................................................................................................................
7-44 Data Warehousing Fundamentals
8
.................................
Choosing a Computing
Architecture
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
Overview
Defining
Choosing
Choosing aa Planning
DW Concepts Computing
Computing Warehouse
& Terminology Architecture Storage
Architecture
Planning Meeting a
Modeling ETT Managing
for a Business
the Data (Building the the Data
Successful Need
Warehouse Warehouse) Warehouse
Warehouse
Analyzing
Supporting
User Query
End User
Needs Access
Project Management
(Methodology, Maintaining Metadata)
Objectives
.....................................................................................................................................................
8-2 Data Warehousing Fundamentals
Overview
.....................................................................................................................................................
Overview
The previous lesson covered modeling the data warehouse. This lesson discusses
choosing a computing architecture for the warehouse. Note that the “Choosing a
Computing Architecture” block is highlighted in the course road map on the facing
page.
Specifically, this lesson examines the computer architectures that commonly support
data warehouses. The benefits of each hardware architecture and reasons for using
distributed warehouses are examined. Students examine the technology requirements
of a database server for warehousing.
Objectives
After completing this lesson, you should be able to do the following:
• Discuss the architectural requirements for the data warehouse
• Consider the benefits of each hardware architecture
• Describe the database server characteristics required in a warehouse environment
• Review the importance of parallelism for the data warehouse environment
.....................................................................................................................................................
Data Warehousing Fundamentals 8-3
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
Architectural Requirements
Flexibility Integration
User Business
Budget Technology
®
.....................................................................................................................................................
8-4 Data Warehousing Fundamentals
Architecture Requirements
.....................................................................................................................................................
Architecture Requirements
The data warehouse tenets described on the top-left slide are perceived to be the
primary tenets in a data warehouse environment—that is, the architecture must be
scalable, manageable, available, extensible, flexible, and integrated. This list can be
extended to include tunable, reliable, robust, supportable, and recoverable.
Making Compromises
Compromises may affect the task of balancing user needs and business requirements if
budgetary constraints restrain your choices or if technical difficulties are too
challenging.
The architecture requirements definition must be considered at an early stage, in
parallel with the user requirements. Only at this time can successful choices be made.
Architecture requirements definition is a specific phase of the Oracle Data Warehouse
Method (DWM).
.....................................................................................................................................................
Data Warehousing Fundamentals 8-5
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
Hardware Architectures
Hardware Architectures
• Robust • VLM
• Available • 64-bit
• Reliable • Connective
• Extensible • Open
• Scalable
• Supportable
• Recoverable
• Parallel
.....................................................................................................................................................
8-6 Data Warehousing Fundamentals
The Hardware Architecture
.....................................................................................................................................................
Hardware Requirements
The choice of hardware architecture is critical to the success of the data warehouse and
its infrastructure. Warehouses require hardware architectures that are:
• Robust
• Available
• Reliable
• Flexible
• Extensible
• Scalable
• Supportable
• Recoverable
• Parallel
In addition, the architecture should
• Have a very large memory (VLM) capability
• Be able to use 64-bit addressing
• Be connective and conform to open system standards
Note: Do not confuse the term database server with a file server on a local area
network or any other server. For our purposes, the term database server describes the
Relational Database Management System (RDBMS) or Database Management
System (DBMS).
.....................................................................................................................................................
Data Warehousing Fundamentals 8-7
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
Hardware Architectures
Evaluation Criteria
High
Low Scalability
High
Maturity Low
.....................................................................................................................................................
8-8 Data Warehousing Fundamentals
The Hardware Architecture
.....................................................................................................................................................
Evaluation Criteria
By specifying the hardware requirements early on in the development of the
warehouse, you have enough lead time to acquire and test the chosen components.
Determining the platform depends upon a number of factors, and the different
architectures have advantages and disadvantages that you must evaluate before
making a final decision:
• A symmetric multiprocessing architecture may be sufficient if you have a small
database, can afford a longer response time, and have problems that are not
complex. Problem complexity is determined by the number of users, the type of
calculations, and the types of queries that the system must handle.
• The larger your database, the more complex your problems, and the shorter the
required response time, the closer you are to specifying a massively parallel
processing system.
.....................................................................................................................................................
Data Warehousing Fundamentals 8-9
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
Parallel Processing
.....................................................................................................................................................
8-10 Data Warehousing Fundamentals
The Hardware Architecture
.....................................................................................................................................................
Parallel Processing
Hardware architectures that contain parallel processors are often categorized
according to the resources they share.
• Memory: SMP machines are often described as tightly coupled.
• Disk: Clustered architectures are often described as loosely coupled.
• Nothing: MPP machines are described as loosely or tightly coupled, according to
the way communication is accomplished among nodes.
NUMA is an SMP architecture with loosely coupled memory using uniform and non-
uniform memory access.
.....................................................................................................................................................
Data Warehousing Fundamentals 8-11
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
SMP
Common bus
Shared memory
Shared disks
®
SMP
.....................................................................................................................................................
8-12 Data Warehousing Fundamentals
The Hardware Architecture
.....................................................................................................................................................
Symmetric Multiprocessing
A symmetric multiprocessing (SMP) machine comprises a set of CPUs that share
memory. It has a shared everything architecture:
• Each CPU has full access to the shared memory through a common bus.
• Communication between the CPUs uses the shared memory.
• Disk controllers are accessible to all CPUs.
This is a proven technology, particularly in the data warehousing environment.
Note: A bus is a cable or circuit used to transfer data or electrical signals among
devices.
Benefits
• High concurrency
• Workload balancing
• Moderate scalability
Is not as scalable as MPP or NUMA.
• Easier to administer than a cluster environment, with proven tools
Limitations
• Available memory may be limited—this can be enhanced by clustering
• Bandwidth for CPU to CPU communication and I/O and bus communication
Note: SMP machines are often nodes in a cluster. Multiple SMP nodes can be used
with certain vendors’ architectures—DEC, Pyramid, Sequent, Sun, SparcServer—
where disk is shared among the multiple nodes. Some warehouse sites are exploring
the evolving concept of loaning excess memory or processing capacity among
applications or hardware.
Some SMP vendors allow you to scale to MPP without losing your SMP box. You
simply add interconnect software and associated technology.
.....................................................................................................................................................
Data Warehousing Fundamentals 8-13
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
NUMA
Shared bus
Shared Shared
memory memory
Nonuniform
memory access
Disk Disk
®
NUMA
• Benefits:
– Fully scalable, incremental additions to disk,
CPU, and bandwidth
– Performs better than MPP
– Suited for Oracle server
• Limitations:
– The technology is new and less proven
– You need new tools for easy system
management
– NUMA is more expensive than SMP
.....................................................................................................................................................
8-14 Data Warehousing Fundamentals
The Hardware Architecture
.....................................................................................................................................................
Nonuniform Memory
Shared memory systems are systems with loosely coupled memory. The shared
memory may be accessed by using uniform memory access from CPUs or by
nonuniform memory access (NUMA).
The Oracle Parallel Server can work with either form of memory access, but NUMA is
a more costly form of access and synchronization than uniform memory access. While
any CPU can access the memory, it is more costly for remote nodes.
Benefits
• A fully scalable architecture that can overcome some of the scalability problems of
SMP
• A very scalable parallel architecture, and therefore it is possible to add disk, CPU,
and bandwidth incrementally to any level
• A system that performs better than an MPP system where there are ad hoc or
mixed workloads
• Suited to the Oracle server
Limitations
• The technology is new and less proven.
• You need new tools for easy system management.
• NUMA is more expensive than SMP.
.....................................................................................................................................................
Data Warehousing Fundamentals 8-15
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
Clusters
Shared disks
®
Clusters
• SMP node
• Benefits:
– High availability
– Single database concept, incremental growth
• Limitations:
– Scalability, internode synchronization needed
– Operating system overhead
®
.....................................................................................................................................................
8-16 Data Warehousing Fundamentals
The Hardware Architecture
.....................................................................................................................................................
Clusters
Shared disk, loosely coupled systems have the following characteristics:
• Each node consists of one or more CPUs and associated dedicated memory.
• Memory is not shared between nodes.
• Communication occurs over a high-speed bus.
• Each node has access to all of the disks and other resources.
• An SMP machine can be a node, if the hardware supports it.
Benefits
• High availability; all data is accessible even if one node dies
• The concept of one database, which is an advantage over shared nothing systems
such as MPP
• Incremental growth
Limitations
• Bandwidth of the high speed bus limits the scalability of the system.
• Internode synchronization is required. Each node has a data cache; cache
consistency must be maintained for the locking mechanisms to work effectively.
• The shared disk software gives an overhead on the operating system.
.....................................................................................................................................................
Data Warehousing Fundamentals 8-17
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
MPP
MPP
.....................................................................................................................................................
8-18 Data Warehousing Fundamentals
The Hardware Architecture
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 8-19
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
MPP Benefits
MPP Limitations
• Rigid partitioning
• Cache consistency
• Restricted disk access
• High memory cost per node
• High management burden
CPU CPU CPU CPU
• Careful data placement
.....................................................................................................................................................
8-20 Data Warehousing Fundamentals
The Hardware Architecture
.....................................................................................................................................................
Benefits
• Practically unlimited, and incremental growth
• Very scalable (given careful data placement)
• Fast access between nodes
• Low cost per node (each node is an inexpensive processor)
Each node has its own devices, but, in case of failure, other nodes can access the
devices of the failed node (on most systems); failure may be local to the node.
• Good for DSS and read-only databases
Limitations
• Many database servers (not necessary with Oracle) require rigid data partitioning
for parallelism and scalability.
• Cache consistency must be maintained.
• Disk access is restricted.
• The memory cost per node is high.
• The management burden is high.
• Careful data placement is required for scalability.
.....................................................................................................................................................
Data Warehousing Fundamentals 8-21
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
Windows NT
.....................................................................................................................................................
8-22 Data Warehousing Fundamentals
The Hardware Architecture
.....................................................................................................................................................
Windows NT
The architecture for Windows NT is based on the client-server model. The approach
divides the operating system into an executive running in kernel mode and several
server processes, each running in user mode. Each server process implements a unique
operating system environment.
Benefits
• Windows NT server operating system includes built-in Web services that provide a
complete, integrated intranet solution.
• Windows NT offers scalability improvements of up to 33 percent, yielding more
linear scalability on machines with eight or more processors.
• Ease of management and control with user profiles and system policies enable
system administrators to easily manage user desktops, including the ability to
control access to the network and to desktop resources as well as support for users
accessing multiple workstations.
Limitations
• Windows NT is not as secure as other operating systems such as UNIX.
• On other operating systems, you can execute programs on your machine remotely,
but you cannot do this with Windows NT.
• Although Windows NT can support SMP with up to 32 processors, Windows NT
has been criticized for its lack of linear scalability beyond four processors.
• Addressing space limits Windows NT applications to two gigabytes. This is
insufficient for large data warehouses.
.....................................................................................................................................................
Data Warehousing Fundamentals 8-23
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
Architectural Tiers
• Tiered structures:
– Modular
– Logical separation
• Distributed structures:
– Two-tier
– Three-tier
– Four-tier (and more)
.....................................................................................................................................................
8-24 Data Warehousing Fundamentals
The Hardware Architecture
.....................................................................................................................................................
Architectural Tiers
Architectures can be the simple two-tier type, the more complex three-tier type, or if
Web applications are involved up to a four-tier type. This enables a useful division of
labor for specific tasks and processes, and can assist and complement the network
setup.
.....................................................................................................................................................
Data Warehousing Fundamentals 8-25
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
Middleware
.....................................................................................................................................................
8-26 Data Warehousing Fundamentals
The Hardware Architecture
.....................................................................................................................................................
Middleware
Middleware is a term that is used to describe technologies that allow you to integrate
multiple server technologies together in a seamless manner. Middleware tools are
common in today’s computing environment. Oracle gateway technology is one
example of middleware available off the shelf.
In a multitier data warehousing environment with Internet access, middleware is
becoming increasingly redefined and refined.
.....................................................................................................................................................
Data Warehousing Fundamentals 8-27
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
• Robust
• Available
• Reliable
• Extensible
• Scalable
• Supportable
• Recoverable
• Parallel
Parallelism
• Database
• Query
• Load
• Index
• Sort
• Backup
• Recovery
.....................................................................................................................................................
8-28 Data Warehousing Fundamentals
Database Server Requirements
.....................................................................................................................................................
Parallelism
The driving force behind the warehouse implementation is the needs of the end users
who require access to the information. The database environment must handle all
operational tasks and processes quickly and efficiently. Of course parallel capabilities
minimize the time taken to perform all the major functions of the warehouse and
maximize availability.
As you have seen parallelism at all levels is becoming mandatory for warehouses:
• Database (server)
• Query
• Load
• Index
• Sort
• Backup
• Recovery
.....................................................................................................................................................
Data Warehousing Fundamentals 8-29
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
Further Considerations
• Optimization strategy
• Partitioning strategy
• Summarization strategy
• Indexing techniques
• Hardware and software scalability
• Availability
• Administration
Server Environments
.....................................................................................................................................................
8-30 Data Warehousing Fundamentals
Database Server Requirements
.....................................................................................................................................................
Further Considerations
Parallelism is not the only consideration; you must also consider the following:
• The optimization strategy, particularly star query techniques employed with star
and snowflake structures (Today’s servers enable you to optimize data access in
many different ways.)
• The partitioning strategy
• Summarization strategies, to ensure that the overhead of creating summaries does
not affect the load
• Indexing techniques, in particular, bitmap indexes
• Hardware and software scalability
• Availability of the warehouse
• The system administration, which must easily manage the entire infrastructure
Server Environments
Many different database servers and hardware architectures can be employed for a
warehouse solution. It is generally assumed that data warehouse database technology
means relational technology.
• Operational Servers: Open, mainframe proprietary database servers (whether
network database server, hierarchical database server, or relational database
server), such as Oracle, IMS, DB2, DB2/PE, VSAM, Rdb, Non-Stop, SQL, or
RMS.
• Warehouse Servers: Open (usually relational) database servers that may be
warehouse specific or general purpose, such as Oracle, Informix, Adabas D,
OpenIngres, or Red Brick.
• Data Mart Servers: Relational, multidimensional (OLAP) databases, or both; they
may be warehouse specific or general purpose, such as Oracle, Oracle Express,
Arbor Essbase, MS SQL Server, and NT based environments.
.....................................................................................................................................................
Data Warehousing Fundamentals 8-31
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
Parallel Processing
Parallel
Processor 1
Processor 2
Processor 3
Processor 4
®
Parallel Database
• Increased speed
• Improved scalability Parallel
Processor 1
Processor 2
Processor 3
Processor 4
• Performance gains
– Availability
– Flexibility
– More users
.....................................................................................................................................................
8-32 Data Warehousing Fundamentals
Parallel Processing
.....................................................................................................................................................
Parallel Processing
A parallel processor takes a task (usually a large task) and divides it into smaller tasks
that can be executed concurrently on one or more nodes (separate processors). As a
result, a large task requested by a single user completes more quickly. Before
examining the individual parallel features, consider the parallel database.
Parallel Database
A parallel database takes advantage of architectures that share access to data, software,
and peripheral devices by running multiple instances that share a single physical
database.
This type of processing has two key features:
• Increased speed: The server can perform the same task in less time
• Improved scalability: The ability to perform a task many times larger, on a system
many times larger, without any performance degradation
These key features give you the following benefits:
• Higher performance
• Greater availability
• Greater flexibility
• Greater accessibility to online users
All of these features directly benefit the warehouse and are supported by the Oracle7,
Oracle8, and Oracle8i Server.
.....................................................................................................................................................
Data Warehousing Fundamentals 8-33
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
Parallel Query
Sub-
Query
Sub-
Query Query
Sub-
Query
Parallel Load
Order table
.....................................................................................................................................................
8-34 Data Warehousing Fundamentals
Parallel Processing
.....................................................................................................................................................
Parallel Query
Most database servers today support parallel query. Specifically, the Oracle Server
parallel query option divides the work of processing a single SQL statement among
multiple query server processes. In some applications, particularly decision support
systems, an individual query may use vast amounts of CPU resource and disk I/O. The
server parallelizes individual queries into units of work that can be processed
simultaneously.
Parallel Load
Parallelism can dramatically speed up loading data. Database servers can bypass
standard SQL processing (that is, data manipulation language commands, such as
INSERT), and the data is loaded directly into the database tables.
.....................................................................................................................................................
Data Warehousing Fundamentals 8-35
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
Parallel Processing
.....................................................................................................................................................
8-36 Data Warehousing Fundamentals
Parallel Processing
.....................................................................................................................................................
Parallel Index
Creating an index in parallel decreases the time required to create and reconfigure a
warehouse. Many indexes exist in the warehouse database. Nearly every attribute on
dimension tables and composite key values on the fact table are indexed. Indexes take
up a lot of space in the warehouse, and you must consider the direct access storage
device (DASD) needed for indexes as well as fact and dimension tables.
Parallel Sort
Sorting is an intensive task that requires a substantial amount of memory. If you are
working in a parallel environment, sort areas are allocated more efficiently to reduce
serialization and cross-instance pinging. Sort space is cached in memory (in the Oracle
server this is in the System Global Area).
Parallel Backup
With parallel operations, backups can be performed simultaneously from any node of a
parallel server.
• Online backups enable the database to be backed up while active, allowing users
continuous access.
• Offline backups enable the database to be backed up while shutdown, preventing
user access.
Parallel Recovery
The goal of parallel recovery is to employ I/O parallelism to reduce the elapsed time
required to perform crash recovery, instance recovery, or media failure recovery. The
server uses one process to read files sequentially and dispatch redo information to
several recovery processes to apply the changes from the log files to the data files.
.....................................................................................................................................................
Data Warehousing Fundamentals 8-37
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
Summary
.....................................................................................................................................................
8-38 Data Warehousing Fundamentals
Summary
.....................................................................................................................................................
Summary
This lesson discussed the following topics:
• Outlining the basic architecture requirements for a warehouse
• Highlighting the benefits and limitations of all the different hardware architectures
.....................................................................................................................................................
Data Warehousing Fundamentals 8-39
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
.....................................................................................................................................................
8-40 Data Warehousing Fundamentals
Practice 8-1
.....................................................................................................................................................
Practice 8-1
1 Form into small groups, and consider each of the following hardware
architectures. With your books closed, create a short definition for each
architecture. Each answer should include the benefits and limitations of each
architecture.
Architecture Definition Benefits Limitations
SMP
NUMA
Clusters
MPP
.....................................................................................................................................................
Data Warehousing Fundamentals 8-41
Lesson 8: Choosing a Computing Architecture
.....................................................................................................................................................
.....................................................................................................................................................
8-42 Data Warehousing Fundamentals
9
.................................
Planning Warehouse
Storage
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
Overview
Planning
Planning
Defining Choosing a
DW Concepts Computing Warehouse
Warehouse
& Terminology Architecture Storage
Storage
Planning Meeting a
Modeling ETT Managing
for a Business
the Data (Building the the Data
Successful Need
Warehouse Warehouse) Warehouse
Warehouse
Analyzing
Supporting
User Query
End User
Needs Access
Project Management
(Methodology, Maintaining Metadata)
Objectives
.....................................................................................................................................................
9-2 Data Warehousing Fundamentals
Overview
.....................................................................................................................................................
Overview
The previous lesson covered choosing a computing architecture. This lesson discusses
planning warehouse storage. Note that the “Planning Warehouse Storage” block is
highlighted in the course road map on the facing page.
Specifically, this lesson examines the database setup and management issues such as
partitioning, indexing, and ways to protect your database.
Objectives
After completing this lesson, you should be able to do the following:
• Discuss different partitioning methods and types of indexes
• Consider the benefits and limitations of different RAID levels in protecting the
database
.....................................................................................................................................................
Data Warehousing Fundamentals 9-3
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
Data Partitioning
Drop
– Management
– Archiving
– Indexing
Other data is not affected
®
Objects to Partition
• Tables:
– Fact
– Dimension
• Indexes
.....................................................................................................................................................
9-4 Data Warehousing Fundamentals
The Server Data Architecture
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 9-5
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
Horizontal Partitioning
Vertical Partitioning
.....................................................................................................................................................
9-6 Data Warehousing Fundamentals
The Server Data Architecture
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 9-7
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
Partitioning Methods
.....................................................................................................................................................
9-8 Data Warehousing Fundamentals
The Server Data Architecture
.....................................................................................................................................................
Partitioning Methods
The different types of partitioning methods that are available for Oracle8 and Oracle8i
are listed below.
• Range Partitioning (Oracle8 and Oracle8i)
Range partitioning exists since Oracle8. This option supports partitioning data
based on ranges of values. Range partitioning guarantees that only data with a
particular set of values is contained in each partition. Range partitioning is good
for rolling windows of data.
• Hash Partitioning (Oracle8i)
Hash partitioning is a new feature of Oracle8i. Hash partitioning reduces
administrative complexity by providing many of the manageability benefits of
partitioning, with minimal configuration effort. When implementing hash
partitioning, the administrator simply chooses a partitioning key and the number of
partitions. Oracle8i automatically distributes the data evenly across all partitions.
Hash partitioning is particularly appropriate for tables that do not have a natural
partitioning key.
• Composite Partitioning (Oracle8i)
Composite partitioning partitions data using the range method and within each
partition, subpartitions it, using the hash method. This new type of partitioning,
which is available only in Oracle8i, supports historical operations data at the
partition level, and parallelism (parallel DML) and data placement at the
subpartition level. Composite partition is ideal for both historical data and data
placement.
Two new partitioning methods introduced in Oracle8i, hash and composite
partitioning, offer improvements for tables that do not naturally submit themselves to
range partitioning in one or more of the following areas:
• Ease of specification
• Simplicity of management for support of parallelism
• Reduction in skew in the amount of resources required to perform maintenance
operations (such as export or backup) on different partitions of a table
• Performance by adding support for partitionwise joins and intrapartition parallel
data manipulation language (DML)
Take better advantage of hierarchical storage management solutions.
.....................................................................................................................................................
Data Warehousing Fundamentals 9-9
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
= Query
Result
Star Transformation
Key 2 Stat
Key 3 Year Month
1002 SF
Key 1 Key 2 Key 3 Dollars 1003 1998 March
Product_Table
STAR_TRANSFORMATION_ENABLED
®
.....................................................................................................................................................
9-10 Data Warehousing Fundamentals
The Server Data Architecture
.....................................................................................................................................................
Star Transformation
The star transformation is a cost-based query transformation aimed at executing star
queries efficiently. Whereas the star optimization works well for schemas with a small
number of dimensions and dense fact tables, the star transformation may be considered
as an alternative if any of the following holds true:
• The number of dimensions is large.
• The fact table is sparse.
• There are queries where not all dimension tables have constraining predicates.
The STAR_TRANSFORMATION_ENABLED parameter specifies whether a cost-based
query transformation is applied to star queries. The default value is TRUE. This
parameter can be set dynamically using the ALTER SESSION command.
.....................................................................................................................................................
Data Warehousing Fundamentals 9-11
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
Indexing
B-Tree Index
.....................................................................................................................................................
9-12 Data Warehousing Fundamentals
The Server Data Architecture
.....................................................................................................................................................
Indexing Data
By intelligently indexing data in your data warehouse, you can increase both the
performance and scalability of your warehouse solution. Using indexes, you can
replace a full table scan by a quick read of the index followed by a read of only those
disk blocks that contain the rows needed. The types of indexes are described below.
B-Tree Indexes This is the most common type of indexing, used for high cardinality
columns, and designed for few rows returned. Rather than scanning an entire table to
find rows where certain column satisfies a WHERE clause predicate, you instead
create a separate index structure on that column. This index structure contains a sorted
list of all the actual discrete column values, and each value in the index is associated
with a list of pointers to all the rows in the original table that contain that value. The
index is stored internally using a binary tree (or B-tree) representation in order to
allow the database engine to quickly find any element in the sorted list.
Note: Cardinality is defined as the number of distinct key values expressed as a
percentage of the number of rows in the table. For example, a million-row index with
five distinct values has a low cardinality while a 100-row table with 80 distinct values
has a high cardinality.
.....................................................................................................................................................
Data Warehousing Fundamentals 9-13
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
Bitmap Indexes
Blue - 1000100100010010100
Green - 0001010000100100000
Mauve - 0100000011000001001
Gold - 0010001000001000010
®
.....................................................................................................................................................
9-14 Data Warehousing Fundamentals
The Server Data Architecture
.....................................................................................................................................................
Bitmap Indexes
Bitmap indexes provide substantial performance benefits and storage savings. When a
bitmap index is created on a column, a bit stream (ones and zeros) is created for each
distinct value in the indexed column. They are useful on low cardinality data.
Scanning 1s and 0s is much more efficient than scanning data values.
Bitmap indexes are an alternative to normal B-tree indexes in the following situations:
• The table is large (millions of rows).
• Columns have low cardinality index key values.
.....................................................................................................................................................
Data Warehousing Fundamentals 9-15
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
.....................................................................................................................................................
9-16 Data Warehousing Fundamentals
Protecting the Database
.....................................................................................................................................................
RAID
RAID achieves data accessibility benefits in a cost effective manner:
• Improved reliability (fault tolerance)
• Enhanced storage management
RAID Levels
There are a number of different levels of RAID:
• RAID Level 0: Striping without parity (DSA)
• RAID Level 0+1: Mirrored striping
• RAID Level 1: Mirrored disk array (MDA)
• RAID Level 3: Data striping with byte level parity
• RAID Level 4: Same as RAID 3, but with block level parity
• RAID Level 5: Independent Disk Array (IDA)
Note: RAID Levels 0, 1, and 5 are discussed on the following pages because these are
found to be most useful. In a data warehouse where the workload profile is unknown,
you should use machine striping for all objects. To eliminate contention for disks you
should ensure that tables that are subject to multiple concurrent parallel scans are
given a dedicated set of disks, striped to give the necessary I/O bandwidth and load
balancing abilities.
The stripe size is a hotly debated issue. It impacts tablescan performance as well as
database operational issues, such as backups and restores. When setting the stripe size,
the administrator should ensure that each I/O can be satisfied within one stripe.
.....................................................................................................................................................
Data Warehousing Fundamentals 9-17
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
RAID 0: Striping
RAID 0: Striping
• Benefits:
– Good for simultaneous reads and writes
– No redundancy
– Scalable
• Limitations:
– Not recommended for mission-critical systems
– No recovery from data loss
– One bad sector affects entire disk of data
.....................................................................................................................................................
9-18 Data Warehousing Fundamentals
Protecting the Database
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 9-19
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
Disk 1 Disk 2
Disk 1 Disk 2
Mirror Mirror
• Benefits:
– Complete data redundancy
– No performance penalty
– Improves reads
– Scalability
• Limitations:
– Highest cost of all RAID configurations
.....................................................................................................................................................
9-20 Data Warehousing Fundamentals
Protecting the Database
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 9-21
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
• Benefits:
– Efficient data integrity
– Data reconstruction
– Multiple concurrent seeks across array
– Scalable
• Limitations:
– Disk overhead
– Data write rate
.....................................................................................................................................................
9-22 Data Warehousing Fundamentals
Protecting the Database
.....................................................................................................................................................
.....................................................................................................................................................
Data Warehousing Fundamentals 9-23
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
Backup
.....................................................................................................................................................
9-24 Data Warehousing Fundamentals
Protecting the Database
.....................................................................................................................................................
Backup
The backup and recovery strategy for a warehouse needs to be considered at the design
stage. Details such as how the data is partitioned greatly affect the strategy. For small
and medium databases, daily cold backups (taken while all instances of the database
are shut down) and export/import are viable backup tools.
However, once you move to very large databases (VLDBs), complete cold backups
become difficult to fit into an overnight window. In addition, the disk space required
for a complete export of a large database becomes an issue. You need to consider other
strategies, such as using tape or other devices.
The defined backup strategy for the warehouse should allow for hot backups, where
you can back up any part of the database at any time of the day, while the database
instances are still active. With Oracle, this means backing up individual and active
tablespaces.
You should back up every component that is essential to warehouse operations—
everything required to restore a working environment:
• Fact data
• Dimension data
• Data warehouse and metadata schema
• Data warehouse metadata
Export/Import
The export/import utility enables an entire or part of a database to be extracted into a
dump file and then imported into another database (under another owner if required).
Generally, import/export of a VLDB uses too much disk space. You could use named
pipes to a disk on a UNIX system to overcome space problems. However, this
technique would be very time-consuming.
.....................................................................................................................................................
Data Warehousing Fundamentals 9-25
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
Summary
.....................................................................................................................................................
9-26 Data Warehousing Fundamentals
Summary
.....................................................................................................................................................
Summary
This lesson discussed the following topics:
• Discussing vertical partitioning and horizontal partitioning
• Distinguishing the different types of partitioning methods
• Distinguishing between B-tree index and bitmap index
• Understanding why warehouse typically uses RAID 0, 1, or 5 to protect the
database
.....................................................................................................................................................
Data Warehousing Fundamentals 9-27
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
.....................................................................................................................................................
9-28 Data Warehousing Fundamentals
Practice 9-1
.....................................................................................................................................................
Practice 9-1
1 For the following description, state the type of partitioning method it best
describes. The partitioning methods are range partitioning, hash partitioning, and
composite partitioning.
Description Partitioning Method
Places specific ranges of table entries on different disks. For
example, records having “name” as a key may have names
beginning with A-B in one partition, C-D in the next, and so
on. Likewise, a DSS managing monthly operations might
partition each month onto a different set of disks.
Distributes DBMS data evenly across the set of disk
spindles. This partitioning method is applied to one or more
database keys, and the records are distributed across disk
subsystems accordingly.
The drawback of this partitioning method is the quantity of
data may vary significantly from one partition to another and
the frequency of data access may vary as well. For example,
as the data accumulates, it may turn out that a larger number
of customer names fall into the M-N range than the A-B
range.
This partition method is a combination of two partitioning
methods. A table that is partitioned using this method is
initially partitioned by range, and then subpartitioned using
the hash method.
.....................................................................................................................................................
Data Warehousing Fundamentals 9-29
Lesson 9: Planning Warehouse Storage
.....................................................................................................................................................
2 For each of the following descriptions, state the type of indexing method it best
describes. The indexing methods are B-tree, bitmap, and index-organized tables.
Description Indexing Method
Contains a hierarchy of highest-level and succeeding lower-
level index blocks. The upper level blocks are called branch
blocks, and they point to the lower-level blocks. The leaf
blocks are the lower-level blocks and they contain the unique
ROWID that points at the location of the actual row.
This indexing method will benefit queries in which the
WHERE clause contains multiple predicates on low-
cardinality columns.
This method merges table data and index data into one
structure. Thus, the data is the index and the index is the
data.
3 Form into small groups, and consider each of the following questions. For each
question, discuss in your groups and present your group’s answers to the class at
the end of the discussion.
a How does RAID-5 differ from RAID-1?
b How do I decide between RAID-5 and RAID-1?
c What variables can affect the performance of a RAID-5 device?
d What types of files are suitable for placement on RAID-5 devices?
4 For each of the descriptions below, assign the RAID level, such as RAID Level 0,
RAID Level 1, or RAID Level 5.
Description RAID Level
This RAID level has the lowest cost and highest performance.
This RAID level is low cost and has high availability.
This RAID level has high performance and high availability.
.....................................................................................................................................................
9-30 Data Warehousing Fundamentals