Oracle Data Mart

Oracle Data Mart Suite
The Oracle Data Mart Suite Cookbook
Release 2.6
August 1999 Part No. A75671-01
The Oracle Data Mart Suite Cookbook, Release 2.6 Part No. A75671-01 Copyright 1997, 1999, Oracle Corporation. All rights reserved. Primary Author: Alejandro Butinof
Contributors: Rob Abbott, Dan Abugov, Paula Bingham, Janet Blowney, Dan Carwin, Olaf Fermum, Christina Gibb, Gita Gupta, Jean Howard, Rajeev Jain, Karlene Jensen, Sudip Majumder, Karen McKeen, Paul Narth, Bimal Patel, Hanne Rasmussen, Hari Sankar, Phil Slater, Mike Schmitz, Lyne Therien The Programs (which include both the software and documentation) contain proprietary information of Oracle Corporation; they are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright, patent, and other intellectual and industrial property laws. Reverse engineering, disassembly, or decompilation of the Programs is prohibited. The information contained in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. Oracle Corporation does not warrant that this document is error free. Except as may be expressly permitted in your license agreement for these Programs, no part of these Programs may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Oracle Corporation. If the Programs are delivered to the U.S. Government or anyone licensing or using the Programs on behalf of the U.S. Government, the following notice is applicable: Restricted Rights Notice Programs delivered subject to the DOD FAR Supplement are "commercial computer software" and use, duplication, and disclosure of the Programs, including documentation, shall be subject to the licensing restrictions set forth in the applicable Oracle license agreement. Otherwise, Programs delivered subject to the Federal Acquisition Regulations are "restricted computer software" and use, duplication, and disclosure of the Programs shall be subject to the restrictions in FAR 52.227-19, Commercial Computer Software - Restricted Rights (June, 1987). Oracle Corporation, 500 Oracle Parkway, Redwood City, CA 94065. The Programs are not intended for use in any nuclear, aviation, mass transit, medical, or other inherently dangerous applications. It shall be the licensees responsibility to take all appropriate fail-safe, backup, redundancy, and other measures to ensure the safe use of such applications if the Programs are used for such purposes, and Oracle Corporation disclaims liability for any damages caused by such use of the Programs. Oracle is a registered trademark, and Better Decisions Made Simple, Oracle8, Oracle Discoverer, PL/SQL, SQL*Loader, SQL*Net, and SQL*Plus are trademarks or registered trademarks of Oracle Corporation. All other company or product names mentioned are used for identification purposes only and may be trademarks of their respective owners.
Contents
Send Us Your Comments ................................................................................................................ xvii About the Cookbook ............................................................................................................................ xix 1 Data Mart Concepts
What Is a Data Mart? .......................................................................................................................... How Is It Different from a Data Warehouse?........................................................................... Dependent and Independent Data Marts ................................................................................. What Are the Steps in Building a Data Mart? ............................................................................... Designing ....................................................................................................................................... Constructing .................................................................................................................................. Populating ..................................................................................................................................... Accessing ....................................................................................................................................... Managing ....................................................................................................................................... Oracle Data Mart Suite ...................................................................................................................... What Is in Oracle Data Mart Suite?................................................................................................. Oracle Data Mart Designer.......................................................................................................... Oracle8 Enterprise Edition .......................................................................................................... Oracle Enterprise Manager ......................................................................................................... Oracle Data Mart Builder ............................................................................................................ Oracle Discoverer ....................................................................................................................... Oracle Reports............................................................................................................................. Oracle Web Application Server ................................................................................................ How Do You Build a Data Mart Using These Products? .......................................................... Data ModelingData Mart Design and Construction ......................................................... 1-1 1-1 1-2 1-3 1-4 1-5 1-6 1-6 1-7 1-7 1-8 1-8 1-9 1-9 1-9 1-10 1-10 1-10 1-11 1-11
iii
Process ModelingPopulating the Data Mart....................................................................... 1-12 Business ModelingAccessing the Data ................................................................................ 1-13 Managing the Data Mart............................................................................................................ 1-13
2 Case StudyThe Global Computing Company

Case Study Scenario ........................................................................................................................... Global Computing Information Requirements ............................................................................ Business Analysis Questions....................................................................................................... Information Requirements from the Information Technology (IT) Department ................ Information Requirements from the Sales and Marketing Department............................... The Oracle Data Mart Suite Databases ........................................................................................... What Does the DMDB Database Include? ................................................................................ Database Files......................................................................................................................... Database Users....................................................................................................................... Data Mart Builder Repository.............................................................................................. Database Structures for the Global Computing Case Study ...................................................... Source Data .................................................................................................................................... Target Tables ................................................................................................................................. Overview of the Hands-on Exercises......................................................................................... 2-1 2-2 2-3 2-3 2-3 2-4 2-4 2-4 2-5 2-6 2-7 2-7 2-7 2-7
Requirements and Design

What Happens in This Step of the Process? .................................................................................. What Is the Role of Data Mart Designer? ...................................................................................... Defining the Scope of the Data Mart Project ................................................................................ Defining the Requirements for the Data Mart .............................................................................. Business Requirements ................................................................................................................ Technical Requirements............................................................................................................... How Do You Know If You Have Done It Right? ..................................................................... Global Computing CompanyRequirements Definition ......................................................... Input from Interviews .................................................................................................................. Information Requirements .......................................................................................................... Performance Requirements ......................................................................................................... Requirements Summary .............................................................................................................. Data Mart Design ................................................................................................................................ 3-1 3-2 3-2 3-3 3-3 3-4 3-4 3-5 3-5 3-6 3-6 3-7 3-7
iv
Creating a Logical Design ................................................................................................................. Creating a Wish List of Data ..................................................................................................... Identifying Sources..................................................................................................................... Classifying Data for the Data Mart Schema ........................................................................... Dimensions........................................................................................................................... Facts....................................................................................................................................... Granularity ........................................................................................................................... Designing the Star Schema........................................................................................................ Moving from Logical to Physical Design..................................................................................... Estimating the Size of the Data Mart ....................................................................................... What Is Metadata?...................................................................................................................... Global Computing CompanyData Mart Design .................................................................... Setting Up Data Mart Designer ................................................................................................ Creating a Designer User ................................................................................................... Creating a Designer Application....................................................................................... Creating a Wish List of Data ..................................................................................................... Adjusting Your Wish List to Reality................................................................................. Determining the Granularity Level .................................................................................. Classifying Data for the Global Computing Star Schema .................................................... Creating the Logical Design...................................................................................................... Sketching an Initial Logical Design .................................................................................. Capturing the Design of the Source OLTP System ........................................................ Creating a New Version ..................................................................................................... Completing the Logical Design ......................................................................................... Adding Attributes ............................................................................................................... Creating the Physical Design .................................................................................................... Transforming the Logical Design into a Physical Design ............................................. Displaying the Default Physical Design .......................................................................... Adding a Staging Table ...................................................................................................... Generating the DDL Script........................................................................................................ Estimating Size..................................................................................................................... Translating the Physical Design to a Physical Implementation ..........................................
3-9 3-10 3-11 3-13 3-13 3-14 3-14 3-14 3-16 3-16 3-18 3-18 3-19 3-19 3-21 3-23 3-24 3-25 3-26 3-27 3-27 3-33 3-40 3-41 3-44 3-46 3-46 3-47 3-50 3-52 3-54 3-55
4 Construct
What Happens in This Step of the Process? .................................................................................. What Is the Role of Oracle8 in the Data Mart?.............................................................................. Storage Management.................................................................................................................... Fast Access ..................................................................................................................................... Data ProtectionBackup and Recovery ................................................................................... Access by Multiple Users............................................................................................................. Database Security.......................................................................................................................... Understanding Oracle8 Database Server: The Building Blocks ................................................ Oracle Processes............................................................................................................................ Oracle Memory Structures .......................................................................................................... Oracle Instances ............................................................................................................................ Client/Server Architecture.......................................................................................................... Oracle Files..................................................................................................................................... Putting the Building Blocks Together........................................................................................ How Oracle8 Organizes and Manages Storage........................................................................ Tablespaces, Datafiles, and Data Blocks............................................................................. Segments and Extents and How They Relate to Tablespaces ....................................... Schema Objects in Oracle8......................................................................................................... Tables Revisited ................................................................................................................... IndexesBitmap and B-Tree ............................................................................................. Views ..................................................................................................................................... SQL Optimization and Execution............................................................................................. How SQL Statements Are Processed ................................................................................ Who Does All of the Work? ............................................................................................... Different Ways of Getting to DataAccess Paths.......................................................... The Cost-Based Optimizer ................................................................................................. Generating Statistics............................................................................................................ Parallel Processing...................................................................................................................... Parallel Query ...................................................................................................................... Parallel CREATE TABLE. . . AS SELECT Statements .................................................... Parallelizing SQL Statements............................................................................................. Setting the Degree of Parallelism ...................................................................................... Parallel Direct Path Load.................................................................................................... Parallel Index Create ........................................................................................................... 4-1 4-2 4-2 4-2 4-2 4-3 4-3 4-4 4-4 4-4 4-4 4-4 4-5 4-7 4-7 4-8 4-11 4-11 4-11 4-12 4-13 4-14 4-15 4-16 4-16 4-17 4-18 4-18 4-19 4-21 4-21 4-22 4-22 4-23
vi
Star Query Processing................................................................................................................ Star Schema Revisited......................................................................................................... What Is a Star Query? ......................................................................................................... Optimizing Star Query ExecutionStar Transformation ............................................. Partitioned Tables and Indexes ................................................................................................ Global Computing Case StudyManaging Data Mart Storage ............................................. Logging In to Oracle Enterprise Manager Components....................................................... Adding Space to the Database .................................................................................................. Creating a New Tablespace ............................................................................................... Adding Space to the USERDATA Tablespace ................................................................ Miscellaneous Tasks................................................................................................................... Starting Up and Shutting Down the Database ....................................................................... Shutting Down the Database ............................................................................................. Starting Up the Database....................................................................................................
4-23 4-23 4-24 4-25 4-25 4-28 4-28 4-30 4-31 4-35 4-35 4-35 4-36 4-38
Extraction, Transformation, and Transportation

What Happens in This Step of the Process? .................................................................................. What Is the Role of Oracle Data Mart Builder? ............................................................................ Oracle Data Mart Builder .................................................................................................................. Parts of a Data Flow Plan ............................................................................................................ Built-in Transforms ...................................................................................................................... Source Transforms................................................................................................................. Data Flow Transforms .......................................................................................................... Record Manipulation Transforms....................................................................................... Column Manipulation Transforms..................................................................................... Sort Transforms ..................................................................................................................... Sink Transforms..................................................................................................................... Data Mart Population Transforms...................................................................................... VBScript Transforms .................................................................................................................. Transform Software Developers Kit ....................................................................................... Oracle Data Mart Builder Admin .................................................................................................. Understanding Data Extraction and Oracle Data Mart Builder .............................................. Primary Data Sources and Targets........................................................................................... Source MetadataProviding Information About the Source ...................................... Capturing Changes in Source Data .................................................................................. 5-1 5-2 5-2 5-3 5-4 5-4 5-5 5-6 5-7 5-8 5-8 5-9 5-11 5-12 5-12 5-13 5-13 5-13 5-14
vii
Maintaining Referential Integrity...................................................................................... Understanding Data Transformation ............................................................................................ Understanding Data Transportation ............................................................................................. What Are the Steps in Populating a Data Mart? ......................................................................... Global Computing Case StudyPopulating the Data Mart .................................................... Starting Assumptions................................................................................................................. What Happens in This Section?................................................................................................ Logging In to Data Mart Builder .............................................................................................. Representing Source Data ............................................................................................................... Creating a BaseView and a MetaView of Source Data.......................................................... Representing Target Data ................................................................................................................ Creating a BaseView for Target Data....................................................................................... Creating a BaseView and MetaView for the Staging Area ................................................... Extracting and Transforming the Source Data ............................................................................ Using the Data Flow Editor to Create and Run Data Flow Plans........................................ Displaying the Plan and Results........................................................................................ Selecting the Tables to Use in the Plan ............................................................................. Adding Transforms to the Plan ......................................................................................... Creating Data Flow Plans to Populate the Target Star Schema ........................................... Loading the Time Dimension Table DAYS............................................................................. Creating Plans for Other Dimension Tables ........................................................................... Loading the Customer Dimension Table CUSTOMERS....................................................... Loading the Product Dimension Table PRODUCTS............................................................. Loading the Channel Dimension Table CHANNELS ........................................................... Populating the Fact Table SALES ............................................................................................. Reenabling Constraints....................................................................................................... Analyzing the Table SALES ............................................................................................................ What Have You Learned and What Is Next? ................................................................................
5-14 5-15 5-16 5-16 5-18 5-18 5-19 5-21 5-22 5-22 5-25 5-25 5-27 5-29 5-29 5-29 5-30 5-31 5-31 5-32 5-36 5-36 5-39 5-44 5-47 5-53 5-53 5-54
Access to the Database

What Happens in This Step of the Process? .................................................................................. Reviewing Your Data Model ............................................................................................................ What Is the Role of Oracle Discoverer in the Data Mart? ........................................................... Discoverer User Edition............................................................................................................... Discoverer Administration Edition............................................................................................ 6-1 6-1 6-2 6-3 6-4
viii
End User Layer ............................................................................................................................. Discoverer Viewer for the Web .................................................................................................. Basic Steps in Designing a Business Area ..................................................................................... Creating and Managing a Business Area .................................................................................. Managing End-User Access ........................................................................................................ Using Oracle Discoverer to Improve Query Performance .......................................................... Automatic Summary RedirectionSummary Tables............................................................. Query Prediction........................................................................................................................... Query Governor............................................................................................................................ Client-Side Cubic Cache .............................................................................................................. Global Computing Case StudyCreating the Metalayer .......................................................... Installing the End User Layer ..................................................................................................... Creating a New Business Area ................................................................................................. Granting Business Area Access ................................................................................................ Optional Formatting................................................................................................................... Renaming Folders and Adding Descriptions.................................................................. Renaming an Item ............................................................................................................... Hiding Items in the Business Area ................................................................................... Formatting Currency .......................................................................................................... Creating a New Item for Calculations.............................................................................. Defining a Hierarchy .......................................................................................................... Creating a Join...................................................................................................................... Defining a New Item Class ................................................................................................ Creating a Condition........................................................................................................... Creating a Summary Table........................................................................................................ Analyzing Your Data: Discoverer User Edition .......................................................................... Types of Reports ......................................................................................................................... Table Format ........................................................................................................................ Page-Detail Table Format................................................................................................... Crosstab Format................................................................................................................... Page-Detail Crosstab Format ............................................................................................. Graphical Format................................................................................................................. Modifying Formats.............................................................................................................. Drilling Down and Drilling Up ................................................................................................
6-4 6-4 6-4 6-5 6-5 6-6 6-6 6-7 6-7 6-8 6-8 6-8 6-10 6-15 6-17 6-17 6-18 6-19 6-21 6-21 6-22 6-23 6-24 6-26 6-27 6-28 6-28 6-29 6-29 6-30 6-31 6-32 6-32 6-33
ix
Investigating Your Data ................................................................................................................... Where Do You Begin? ................................................................................................................ Global Computing Case StudyAccessing the Data ................................................................ Creating a Report........................................................................................................................ Reformatting a Report................................................................................................................ Creating a Page-Detail Crosstab Report.................................................................................. Drilling Down ............................................................................................................................. Creating Graphs.......................................................................................................................... Drilling to Detail ......................................................................................................................... Drilling Out.................................................................................................................................. Custom Formatting .................................................................................................................... Publishing Your Work for Others to View ............................................................................. Exporting to Excel................................................................................................................ Exporting to HTML ............................................................................................................. Exporting to Other File Formats........................................................................................ Scheduling a Workbook .................................................................................................................. What Have You Learned and What Is Next? ................................................................................
6-33 6-34 6-35 6-35 6-38 6-40 6-42 6-44 6-47 6-47 6-47 6-48 6-48 6-48 6-48 6-49 6-49
Report Creation
What Happens in This Step of the Process? .................................................................................. What Is the Role of Oracle Reports in the Data Mart? ................................................................ Scalability ....................................................................................................................................... Components of Oracle Reports................................................................................................... Global Computing Case StudyBuilding a Report .................................................................... Invoking Report Builder .............................................................................................................. Building a Report with the Report Wizard ............................................................................... Finishing the Report............................................................................................................ Formatting the Report in the Live Previewer.................................................................. Making Changes to the Report ................................................................................................. Deploying a Report .......................................................................................................................... What Have You Learned and What Is Next? ................................................................................ 7-1 7-1 7-2 7-2 7-2 7-3 7-3 7-11 7-14 7-16 7-18 7-18
Manage
What Happens in This Step of the Process? .................................................................................. Data Mart Maintenance Issues......................................................................................................... Managing Users and Security .......................................................................................................... User Authentication ..................................................................................................................... User Quotas and Space Requirements ...................................................................................... Resource Profiles .......................................................................................................................... Roles and Privileges ..................................................................................................................... Global Computing Case StudyAdding Users and Roles ........................................................ Creating a New User ID .............................................................................................................. Creating a New Role .................................................................................................................... Managing Database Backup and Recovery ................................................................................. Types of Database Failure ......................................................................................................... Instance Failure.................................................................................................................... Media Failure ....................................................................................................................... Structures Used for Database Recovery .................................................................................. Database Backup Files ........................................................................................................ Redo Logs ............................................................................................................................. Control File........................................................................................................................... Backing Up a Database .............................................................................................................. Database ModesARCHIVELOG and NOARCHIVELOG......................................... Types of Database Backups ............................................................................................... Recovering a Database............................................................................................................... Backing Up and Recovering Read-Only Tablespaces ........................................................... Managing Data Mart Performance ................................................................................................ Capacity Planning ...................................................................................................................... How Many CPUs Do You Need?...................................................................................... Memory Requirements ....................................................................................................... Requirements for the I/O Subsystem............................................................................... Tuning Physical Database Configuration ............................................................................... Setting Initialization Parameters .............................................................................................. Parameters Affecting Resource Consumption ................................................................ Parameters Enabling New Features ................................................................................. Parameters Related to I/O ................................................................................................. 8-1 8-2 8-2 8-3 8-3 8-3 8-4 8-4 8-5 8-9 8-11 8-11 8-11 8-12 8-12 8-12 8-12 8-13 8-13 8-13 8-13 8-14 8-15 8-15 8-16 8-16 8-16 8-17 8-17 8-18 8-18 8-19 8-19
xi
Managing Growth............................................................................................................................. Global Computing Case StudyAdding a Datafile to an Existing Tablespace................ Global Computing Case StudyCreating a Summary Table .................................................. Managing the ETT Environment ................................................................................................... Using Oracle Data Mart Builder Admin ................................................................................. Managing the Repository .......................................................................................................... Registering a Repository..................................................................................................... Backing Up the Repository........................................................................................................ Cleaning Up the Repository ...................................................................................................... Managing Data Collection Agents ........................................................................................... Updating Data in the Data Mart .................................................................................................... Updating Data in the Fact Table............................................................................................... Updating Dimension Tables ..................................................................................................... Scheduling Data Refreshes ........................................................................................................ Scheduling a Plan ................................................................................................................ Deleting a Scheduled Run of a Plan..................................................................................
8-20 8-20 8-21 8-23 8-24 8-25 8-25 8-26 8-27 8-28 8-28 8-29 8-30 8-39 8-39 8-41
External Data
The Value of External Data ............................................................................................................... 9-1 InfoBase Package for the Oracle Data Mart Suite ........................................................................ 9-2 How to Get the External Data ........................................................................................................... 9-3
10
Summary
Project Team ....................................................................................................................................... 10-1 Project Plan......................................................................................................................................... 10-2 Checklists ........................................................................................................................................... 10-3
Index
xii
Figures
31 32 33 34 35 51 52 53 Sources for the Independent Sales and Marketing Data Mart...................................... Star Schema .......................................................................................................................... Primary Keys in the Star Schema ...................................................................................... OLTP Source Schema.......................................................................................................... Data in Flat Files .................................................................................................................. Global Computing CompanyOLTP Source Schema .................................................. Global Computing CompanyFlat File Sources ........................................................... Global Computing CompanyTarget Star Schema ...................................................... 3-12 3-13 3-15 3-24 3-25 5-20 5-20 5-21
xiii
xiv
Tables
21 22 31 32 51 Tablespaces in the DMDB Database .................................................................................. User Accounts for the DMDB Database............................................................................ Global Computing Company: Description of OLTP Schema ...................................... Global Computing Company: Description of Flat Files ............................................... Fields in the Fact Table ...................................................................................................... 2-5 2-5 3-37 3-39 5-47
xv
xvi
Send Us Your Comments

The Oracle Data Mart Suite Cookbook, Release 2.6
Part No. A75671-01
Oracle Corporation welcomes your comments and suggestions on the quality and usefulness of this publication. Your input is an important part of the information used for revision.
I I I I I
Did you find any errors? Is the information clearly presented? Do you need more information? If so, where? Are the examples correct? Do you need more examples? What features did you like most about this manual?
If you find any errors or have any other suggestions for improvement, please indicate the chapter, section, and page number (if available). You can send comments to us in the following ways:
I I I
Electronic mailnedc_doc@us.oracle.com FAX603-897-3316. Attn: Oracle Data Mart Suite Documentation Postal service: Oracle Corporation Oracle Data Mart Suite Documentation One Oracle Drive Nashua, NH 03062-2698 USA
If you would like a reply, please give your name, address, and telephone number:
If you have problems with the software, please contact Oracle Support Services.
xvii
xviii
About the Cookbook

Congratulations on purchasing Oracle Data Mart Suite! Now, you have in one place all of the products that you need to get your data mart up and running quickly. This cookbook provides a how-to overview of what you need to know to build, use, and maintain your Oracle data mart. This cookbook presents an overview of the basics and a guided tour of how to use the products in the suite to quickly implement a data mart. This book does not emphasize formal definition of concepts or detailed discussion of the underlying issuesfor details on each of the products and its use, you should refer to the product documentation available online. You probably have signed up for the classroom training that is offered with the suitethink of the cookbook as detailed notes to complement the class. The organization of the cookbook follows the process flow of building and managing a data mart. At each step in the process, this book provides some technical background on the products used and then uses a case study to walk you through the implementation steps. The cookbook uses the same case study throughout to illustrate each step in building a complete data mart. It shows you how to develop the business requirements, design and build the data mart, and provide your end users with access to the data mart. When you have finished the case study, you should be comfortable enough with the process and tools to begin your real-life data mart project.
xix
Who Should Read This Cookbook

This book is intended for anyone involved in implementing a data mart using Oracle Data Mart Suite. Given the cross-section of organizational units and individuals involved in building a data mart, you could fit any of the following roles:
I
Database administrator Data mart designer Information technology (IT) professional: systems administrator, network administrator, business analyst Data extract programmer or end-user applications programmer Systems integrator
In general, this book defines technical terms and concepts when they are introduced. However, it assumes that you know the underlying operating system and are familiar with basic system administration tasks. It also assumes some knowledge of relational database systems.
How This Cookbook Is Organized

The cookbook is organized into the following chapters:
I
Chapter 1, Data Mart Concepts, gives you an overview of what a data mart is, the business reasons for building a data mart, the components of Oracle Data Mart Suite, and how you can expect to use each of these components to build and manage a data mart. Chapter 2, Case StudyThe Global Computing Company, introduces the case study that is used throughout the cookbook. Chapter 3, Requirements and Design, describes how you develop the business requirements and prepare the project plan for the data mart, then translate these requirements into a logical and physical design. Chapter 4, Construct, describes how you manage data storage and retrieval, and how you exploit the data warehousing features of the Oracle8 database server. Chapter 5, Extraction, Transformation, and Transportation, describes how you extract data from different sources and filter, transform, and load the data into the target data mart schema.
xx
Chapter 6, Access to the Database, describes how you let the user query and analyze the data and get the information needed to support business decisions. Chapter 7, Report Creation, describes how you can create sophisticated reports in a short time. Chapter 8, Manage, describes how you manage users privileges, back up data, and refresh the data in the data mart to reflect changes in the data source. Chapter 9, External Data, provides information about how to get data from external sources. Chapter 10, Summary, describes how to get started building your own data mart and provides checklists for project planning.
Conventions Used in This Cookbook

The following notation and style conventions are used throughout this book:
I
Technical terms are indicated in bold when they are first mentioned. Screen prompts, menu selections, radio buttons, and values to be filled in appear in bold. SQL statements are shown in monospaced font. The following icons are used in the cookbook to highlight hints and pitfalls: Tips and points to remember Pitfalls
Where You Can Get Additional Help

Online tutorials and product documentation provide additional details on each product component of the suite. If you need further help, here are some sources:
I
Oracle Customer Support: Oracle Support Services can help you with answers to immediate questions related to the installation and use of Oracle Data Mart Suite. Oracle Consulting Services: Oracle Consulting provides consulting services to customers who need onsite help.
xxi
Oracle Education Classes: Besides the three-day course that is included in the Oracle Data Mart Suite package, Oracle Education offers a comprehensive portfolio of classes on Oracle software. Classes are offered all over the world. A catalog is available online at the Oracle Web site www.oracle.com.
xxii
1
Data Mart Concepts
This chapter reviews some basic concepts relating to data marts and establishes some working definitions for use in the rest of this book. Although there is a lot of agreement among users and vendors on the definitions and terminology, they have not yet reached complete consensus. If you talk to a dozen people, you are likely to hear about half a dozen similar but slightly differing answers for even something as basic as What is a data mart? This chapter takes a quick look at some definitions and explains what a data mart is (and is not).Chapter 1, "Data Mart Concepts"
What Is a Data Mart?

A data mart is a simple form of a data warehouse that is focused on a single subject (or functional area), such as Sales or Finance or Marketing. Data marts are often built and controlled by a single department within an organization. Given their single-subject focus, data marts usually draw data from only a few sources. The sources could be internal operational systems, a central data warehouse, or external data.
How Is It Different from a Data Warehouse?

A data warehouse, in contrast, deals with multiple subject areas and is typically implemented and controlled by a central organizational unit such as the Corporate Information Technology (IT) group. Often, it is called a central or enterprise data warehouse. Typically, a data warehouse assembles data from multiple source systems. Nothing in these basic definitions limits the size of a data mart or the complexity of the decision-support data that it contains. Nevertheless, data marts are typically smaller and less complex than data warehouses; hence, they are typically easier to
Beta Draft
Data Mart Concepts
1-1
What Is a Data Mart?
build and maintain. The following table summarizes the basic differences between a data warehouse and a data mart:
Data Warehouse Scope Subjects Data Sources Size (typical) Implementation Time Corporate Multiple Many 100 GB - TB+ Months to years Data Mart Line-of-Business (LoB) Single subject Few < 100 GB Months
Dependent and Independent Data Marts

Two basic types of data marts, dependent and independent, are shown in the following figure. The categorization is based primarily on the data source that feeds the data mart. Dependent data marts draw data from a central data warehouse that has already been created. Independent data marts, in contrast, are standalone systems built by drawing data directly from operational or external sources of data or both.
The main difference between independent and dependent data marts is how you populate the data mart; that is, how you get data out of the sources and into the data mart. This step, called the Extraction-Transformation-Transportation (ETT)
1-2
Beta Draft
What Are the Steps in Building a Data Mart?
process, involves moving data from operational systems, filtering it, and loading it into the data mart. With dependent data marts, this process is somewhat simplified because formatted and summarized (clean) data has already been loaded into the central data warehouse. The ETT process for dependent data marts is mostly a process of identifying the right subset of data relevant to the chosen data mart subject and moving a copy of it, perhaps in a summarized form. With independent data marts, however, you must deal with all aspects of the ETT process, much as you do with a central data warehouse. The number of sources are likely to be fewer and the amount of data associated with the data mart is less than the warehouse, given your focus on a single subject. The motivations behind the creation of these two types of data marts are also typically different. Dependent data marts are usually built to achieve improved performance and availability, better control, and lower telecommunication costs resulting from local access of data relevant to a specific department. The creation of independent data marts is often driven by the need to have a solution within a shorter time.

Simply stated, the major steps in building a data mart are:
I
Designing Constructing Populating Accessing Managing
You design the schema, construct the physical storage, populate the data mart with data from source systems, access it to make informed decisions, and manage it over time, as shown in the following figure:
Beta Draft
Data Mart Concepts 1-3
Thats all there is to it!
Designing
The design step is first in the data mart process. This step covers all of the tasks from initiating the request for a data mart through gathering information about the requirements and developing the logical and physical design of the data mart. The design step involves the following tasks:
I
Gathering the business and technical requirements Identifying data sources Selecting the appropriate subset of data Designing the logical and physical structure of the data mart
What Products and Technologies Do You Need?

You accomplish these design tasks using a tool that facilitates design of the data mart at the logical and physical levels. In the process, the tool creates metadata relating to the data mart structures. Ideally, you reuse this metadata at a later stage of your data mart project.
1-4
Beta Draft
Constructing
This step involves creating the physical database and the logical structures associated with the data mart to provide fast and efficient access to the data. This step involves the following tasks:
I
Creating the physical database and storage structures, such as tablespaces, associated with the data mart Creating the schema objects, such as tables and indexes defined in the design step Determining how best to set up the tables and the access structures, such as bitmap indexes, for optimal query execution

You accomplish these construction tasks using the following tools:
I
A relational database management system (RDBMS) performs several functions that are required for the creation and management of a data mart: Storage management: An RDBMS stores and manages the data as you create, add, and delete data. Fast data access: A query is a SQL statement that selects data satisfying a set of conditions that you define. You expect your users to issue frequent queries to access the data stored in the data mart. Your users want prompt answers to their queries. The RDBMS provides mechanisms to process queries quickly and efficiently. Such fast data access is usually facilitated by the use of special algorithms and structures. Data protection: The RDBMS provides a way to recover from system failures such as power failures. It also allows you to back up data on disk to a safe location and restore data from these backups if the disk fails. Such backup mechanisms do not need you to shut off access to usersthe data mart can be running while you protect your data. Multiuser support: The RDBMS provides concurrent access, the ability for multiple users to access and modify data without interfering with each other or overwriting other changes made by other users. Security: You may want to restrict users to certain sets of data; for example, you may want only a very specific set of users to have access to payroll information. The database management system provides a way to regulate access by users to objects and certain types of operations.
Beta Draft
A graphical system management tool: Most operations required to manage a data mart can be accomplished by issuing the appropriate SQL commands. However, it is much more intuitive and efficient to manage the data mart using a graphical system management tool.
Populating
The populating step covers all of the tasks related to getting the data from the source, cleaning it up, modifying it to the right format and level of detail, and moving it into the data mart. More formally stated, the populating step involves the following tasks:
I
Mapping source data to target data Extracting data Cleansing and transforming the data Loading data into the data mart Creating and storing metadata

You accomplish these population tasks using an ETT tool. This tool enables you to look at (browse) the data sources, perform source-to-target mapping, extract the data, transform and cleanse it, and load it into the data mart. In the process, the tool also creates some metadata (data about data) relating to things like where the data came from, how recent it is, what kind of changes were made to the data, and what level of summarization was done. Another name for an ETT tool is a data movement tool. You can perform the same functions using more than one tool.
Accessing
The accessing step involves putting the data to use: querying the data, analyzing it, creating reports, charts, and graphs, and publishing these. Typically, the end user uses a graphical front-end tool to submit queries to the database and display the results of the queries. The accessing step requires that you perform the following tasks:
I
Set up an intermediate layer for the front-end tool to use. This layer, the metalayer, translates database structures and object names into business terms, so that the end user can interact with the data mart using terms that relate to the business function.
1-6
Beta Draft
Maintain and manage these business interfaces. Set up and manage database structures, like summarized tables, that help queries submitted through the front-end tool execute quickly and efficiently.

You accomplish the steps needed to give your users fast, easy access to the data mart using a graphical front-end tool designed and optimized for decision-support functions.
Managing
This step involves managing the data mart over its lifetime. In this step, you perform management tasks such as:
I
Providing secure access to the data Managing the growth of the data Optimizing the system for better performance Ensuring the availability of data even with system failures

You manage your data mart using an intuitive and efficient graphical system management tool, although you can accomplish most tasks by issuing appropriate SQL commands.

Today, selecting a product to suit all your needs is a challenging task. There are too many products, too many vendors, and too much confusion. Often, you must act as a systems integrator for products that are designed, implemented, sold, and supported by different vendors. Products designed and sold separately are not guaranteed to work well together. It would be nice to do one-stop shopping and to start your project confident that these products are of high quality and work well together. That is the idea behind Oracle Data Mart SuiteBetter Decisions Made Simple. It is complete and integrated with all software needed to quickly and simply implement a data mart. The products in the suite address the entire data mart life cycle, from design through build, analysis, and management.
Beta Draft
What Is in Oracle Data Mart Suite?

Oracle Data Mart Suite is built on the Oracle8 database server, the industrys most widely used database for data warehousing. The suite integrates a visual tool for designing the data mart; an easy-to-use graphical tool for extracting information from operational systems; a high-performance, scalable database that serves as a shared repository for both data and metadata; a Web server for intranet access to the data mart; a new-generation querying, reporting, and analysis tool; and documentation and online tutorials. All components in Oracle Data Mart Suite install through a single integrated process. Once installed, the components work together and share information seamlessly. This tight integration enables you to rapidly develop and deploy your data mart. The components of Oracle Data Mart Suite include:
I
Oracle Data Mart Designer Oracle8 Enterprise Edition Oracle Enterprise Manager Oracle Data Mart Builder Oracle Discoverer Oracle Reports Oracle Web Application Server
Oracle Data Mart Designer

You use Oracle Data Mart Designer to design data marts and then store the designs in a repository for reference by Oracle Data Mart Builder. Usually, existing operational systems are the source of data for independent data marts, and the logical structure of these systems is the basis for the data mart design. Oracle Data Mart Designer lets you reverse-engineer the data model of the operational source database using a graphical interface called the Data Schema Diagrammer, and then modify these models to create data mart schemas like the star schema. A schema is a collection of database objects. A star schema is a particular type of schema, called that because the graphical representation looks like a star, with a large fact table in the center and the smaller dimension tables arranged around it. Once you design the data mart and store it in the repository, Oracle Data Mart Designer generates SQL commands to create the physical data structures, like tables and indexes, in the Oracle8 database.
1-8
Beta Draft
Oracle8 Enterprise Edition

This is the heart of the mart! Oracle8 Enterprise Edition provides data management designed specifically for decision support. Data marts require different processing techniques than transaction processing applications due to the complex, ad hoc queries running against large amounts of data. To address these special requirements, Oracle8 Enterprise Edition offers a rich variety of query processing techniques, sophisticated query optimization for choosing the most efficient data access path, and a scalable architecture. The parallelism of Oracle8 Enterprise Edition speeds query response time and provides exceptional analytical and drill-down capabilities. The server executes all critical query operations, including scans, sorts, joins, and aggregations, in parallel. Its cost-based optimizer not only selects the best execution plan for complex queries, but also provides support for star queries. Also, Oracle8 Enterprise Edition implements hash joins and bitmapped indexing to provide dramatic improvements in query execution.
Oracle Enterprise Manager

Oracle Enterprise Manager provides a complete systems management solution, with applications to manage storage, schema objects, logical and physical database backup, database instance startup and shutdown, and user privileges to access the data mart. Your interface to this powerful set of applications is an easy-to-use graphical user interface that lets you point and click through most data mart administration tasks. If you would rather write your own scripts, Oracle Enterprise Manager provides support for that as well. Oracle Enterprise Manager also provides the optional Oracle Diagnostics Pack, Oracle Tuning Pack, and Oracle Change Management Pack. These packs allow the administrator to monitor and diagnose the operation of the database system and applications, tune the performance of the database to meet the needs of a specific data mart, and manage changes to the database and applications.
Oracle Data Mart Builder

Oracle Data Mart Builder provides the technology needed to perform the extraction and transformation tasks, managing the flow of data from a data source into the data mart. You define the extraction and transformation using an intuitive, visual metaphor, called a data flow plan, that defines how data is to be processed as it flows from data sources to a sink, which is typically a table in the data mart. You
Beta Draft
create and maintain data flow plans using a graphical interface, so you can use drag-and-drop operations to model where your data is coming from, how you want it transformed, and where you want the data to go.
Oracle Discoverer
Oracle Discoverer is an end-user analysis and reporting tool that allows your users to easily navigate through the data mart, as well as drill out to other applications such as Microsoft Excel spreadsheets, Web browsers, and Word documents. This tool is built on an end-user layer architecture that abstracts the complexity of underlying database structures and allows users to refer to data in business terms, rather than technical terms. Oracle Discoverer provides a mechanism to automatically create and maintain summary tables, and redirect queries to these tables if that is faster than accessing detailed data. It also collects query statistics to help you build and delete summary tables based on the query activity of your users. You probably know how important performance of an application isif a query tool cannot provide rapid responses, users will reject it. Oracle Discoverer is designed to provide maximum query processing performance. It incorporates several performance-oriented features, such as an expert SQL query engine that dynamically generates performance-optimized SQL queries and exploits Oracle8 warehousing functionality like bitmapped indexes. Oracle Discoverer Viewer lets you view and manipulate reports from a Web browser interface.
Oracle Reports
Oracle Reports provides an easy-to-use productive approach to developing and delivering sophisticated reports in a timely fashion. While Oracle Discoverer lets users construct ad hoc reports, Oracle Reports lets you quickly create a wide variety of reports that your users can use and reuse.
Oracle Web Application Server

Oracle Web Application Server provides a scalable, secure, Web application platform that works with all popular browsers. Included is Web Request Broker, which provides integration with Web servers from Netscape and Microsoft.
1-10
Beta Draft
How Do You Build a Data Mart Using These Products?

Building a data mart using these products is the focus of the rest of the book. This section gives you a birds-eye view of the process by reviewing the major steps.
Data ModelingData Mart Design and Construction

Data modeling involves activities such as investigating the data sources that will populate the data mart, identifying appropriate subsets of source data based on the subject-specific needs of the data mart, and creating a logical and physical design for the target data mart that facilitates decision support. One step in the data modeling process is to understand the structure of the source database that contains the transactional information that will populate the data mart. A design tool like Oracle Data Mart Designer lets you automatically represent an existing database as a visual data diagram, either by accessing the source directly or by using an existing SQL data definition language (DDL) file that describes the structure of the source database. Using this visual data diagrammer, you create a subset of the source data model that contains only the tables and columns that you will later use to populate your data mart. You design the target database by converting the identified subset in the transaction-centric model to a query-centric model, typically a star schema, popular in data warehousing. In the star schema, you classify the data mart tables into two groups: facts and dimensions. The fact table contains the core data element being analyzed (the transactions), and the dimension tables contain attributes about the facts. The star schema definition of the data mart is automatically generated from the visual diagram. You do not have to manually enter SQL commands.
Beta Draft
This metadata, information about the source and target definitions, is stored in a repository to be retrieved later (during the process modeling phase) by Oracle Data Mart Builder. The data mart schema that you design is stored in an Oracle8 database, administered by the Oracle8 database server. Oracle Data Mart Suite has a preinstalled database that you can use as the repository for the tables that you generate from your design.
Process ModelingPopulating the Data Mart

Process modeling involves the specification and application of the changessuch as cleansing, transformation, and aggregation operationsto the source data and populating the data mart with the transformed data. Oracle Data Mart Builder helps you to design and execute the extraction, transformation, and transportation of operational data into the data mart. Oracle Data Mart Builder provides an intuitive and graphical environment for describing the process needed to populate the data mart from operational sources. Those sources can be online transaction processing (OLTP) sources and enterprise data warehouses. Oracle Data Mart Builder supports sources including Oracle, DB2, Sybase, Informix, and Microsoft SQL Server databases, as well as flat files and ODBC sources.
First, you model the process of extracting data from the source database, transforming the data and loading it into the data mart using a data flow plan. You use the drag-and-drop interface of the Oracle Data Mart Builder component, the Data Flow Editor, and a set of predefined transformations to create the data flow plan. To enable the development of custom transforms, the Data Flow Editor also supports SQL, VBScript, and C++.
1-12
Beta Draft
Oracle Data Mart Builder uses the concept of BaseViews, which provide a physical representation of the source databases, and MetaViews, which allow you to create business representations of the source information. Because Oracle Data Mart Builder is integrated with Oracle Data Mart Designer, it can retrieve the BaseView information directly from the Oracle Data Mart Designer repository. After you create the data flow plans, you load the data mart by running the plans using the Oracle Data Mart Builder Data Collection Agent. Because Oracle Data Mart Builder is integrated with the Oracle8 database server, you can load data efficiently using the Oracle8 direct-path bulk loader. You can schedule these plans through the Windows NT scheduling service, a capability that is important for the ongoing process of refreshing the information in the data mart.
Business ModelingAccessing the Data

After you populate the data mart, you must give your users access to the data. This is called business modeling. The intent is to hide the complexities of the physical data mart structures and present information using business-oriented logical structures to which end users can easily relate. Typically, you accomplish this step using an end-user tool such as Oracle Discoverer or Oracle Reports.
Managing the Data Mart

At this point, to generate the best execution plan for complex queries, you need to generate the statistics that the Oracle8 cost-based optimizer requires. Also, you set up the indexes required for star query optimization and any other indexes that you determine are necessary, based on your knowledge of your users query patterns. You need to create user IDs in the database and assign different privileges to users based on their roles in the organization. You also need to determine how you will protect your data from system and machine failures by creating a backup strategy. You can either use the GUI-based applications available as part of Oracle Enterprise Manager or write your own scripts to perform these functions. Finally, as data changes on the source system, you need to reflect the changes in the data stored in the data mart. Remember, you can use the Oracle Data Mart Builder Data Collection Agent to run the data flow plans, and you can schedule these plans through the Windows NT scheduling service. You will use this capability for the ongoing process of refreshing the information in the data mart. This chapter presented an overview of the fundamental concepts. Now, on to the detailed information that will help you build a successful data mart!
Beta Draft
1-14
Beta Draft
2
Case StudyThe Global Computing Company
This cookbook presents a case study to illustrate the use of Oracle Data Mart Suite products in building and managing a data mart. The case study represents activities at a fictitious organization, Global Computing Company. This chapter describes the case study and the physical and logical database structures associated with it. Oracle Data Mart Suite includes a database called DMDB, which holds all data referred to in the case study. The installation takes care of all of the setup needed to access this database. Oracle Corporation recommends that you use this DMDB database as the starting point for building your data mart, rather than creating a new database. That is the approach taken in this book.
Case Study Scenario

The Global Computing Company, established in 1990, distributes computer hardware and software components to customers all over the world. The Sales and Marketing department is not meeting its budgeted numbers and has been challenged to develop a strategy to increase the Marketing return on investment (ROI). Global traditionally experiences low third-quarter sales (July through September). The company has experienced bursts of growth, but for no apparent reason has had lower first-quarter sales during the last two years compared with prior years. Margins have been shrinking, and sales for its flagship product have been declining. Recently, a new sales channel, the Internet, has opened, yet profits are declining. Global wants to analyze where the business is going, which components of its business are profitable, and which could become more profitable.
2-1
Global Computing Information Requirements
The Sales and Marketing organization has been struggling with a lack of timely information about what is selling, who is buying, and how they are buying. In a meeting with the Chief Information Officer (CIO), the Sales VP stated, By the time I get the information, it is no longer useful! I am only able to get information at the end of each month, and it does not have the details I need to do my job. When asked to be more specific about what she needed, she identified the following information requirements:
I
Provide sales data for specific customers. Provide sales detail for mail order, phone, and e-mail sales on a weekly and monthly basis and compare them to past time periodswhen, how, and what is being sold by each channel? Provide margin information on products to understand dollar contribution for each sale.
The CIO has discussed these new requirements with the team and concluded that running these reports against the current production system would be too expensive and too risky. The business analysis reporting requirements are so diversified that the projected cost of development and expected turnaround time for requests would make this solution unacceptable. The team recommends that the Sales and Marketing departments information technologies (IT) group work with Corporate IT to build a data mart to meet their information analysis needs. Here are the high-level business goals that the data mart project must meet:
I
Globals Strategic Goal: Increase company profits by offering one-stop shopping for all hardware and software needs. Sales and Marketing Objective: Analyze industry trends and target specific market segments. Analyze sales channels and increase profits. Identify product trends and develop a strategy for developing the appropriate channels.
The Line of Business (LoB) IT team, in conjunction with the Corporate IT team, has identified Oracle Data Mart Suite as the integrated software package it needs to get the data mart up and running as quickly as possible.

The LoB IT team first conducts a high-level meeting with the Sales and Marketing department staff to define the following high-level issues and requirements.
2-2
Business Analysis Questions

The Sales and Marketing department wants the data mart to answer the following questions:
I
What products are selling? What are the customers buying? What are the sales by location? What are the sales by selling channel? What products are selling together? What are the trends in purchases over time? What products are not selling? Where are we making moneyproducts, customers?
Information Requirements from the Information Technology (IT) Department

To build and manage the data mart, the LoB IT team needs to work with Corporate IT to get answers to questions like:
I
Where is the data located today, on what systems, at which physical locations? What is the format of the data? Is it in a database or flat files? How often is the data refreshed or changed? How volatile is the data, meaning how often does it change? How is the data accessed today? What networking and protocols exist for accessing the data? What is the quality of the data? Is it accurate? Are the models fully populated? Is it really what it purports to be?
Information Requirements from the Sales and Marketing Department

The Sales and Marketing department staff need the following information:
I
Customer account information by region Customer account information by ship-to address The industry, such as manufacturing, consulting, or retailing, associated with the customer
2-3
The Oracle Data Mart Suite Databases
Standard marketing department quarterly reports Standard sales department weekly reports Geographic sales reports by any combination of user-specified attributes The identity of the best and worst sellers Trends in the marketplace over time
The team determines that the order entry database is the primary source of data for the data mart. The order entry database can provide data for:
I
All of the companys customers All of the packages under which the company sells products All of the off-invoice discounts and allowances
Data for all company products is located in flat files. Use these requirements as your guide as you move through the development steps of the data mart.

The installation of Oracle Data Mart Suite provides you with a database instanceDMDB. The DMDB database holds all of the system tables and the Global Computing Company data.
What Does the DMDB Database Include?

During the Oracle8 database server package installation, the DMDB database is installed. DMDB includes all of the database structures and user IDs that you need to start building your data mart without becoming overwhelmed with database creation activities. The database also contains all structures that the tools in Oracle Data Mart Suite need to access and work with the database.
Database Files
Oracle8 server database files are organized into logical units called tablespaces. Table 21 lists all of the tablespaces and associated files in the DMDB database, along with a short description of the type of data in each tablespace.
2-4
Table 21
Tablespace SYSTEM ROLLBACK USERDATA TEMP
Tablespaces in the DMDB Database

Type of Data File Name
Consists of the data dictionary, including definitions of sys1dmdb.ora tables, views, and stored procedures needed by the RDBMS. Holds the rollback segments, which record the data needed to undo changes made by database transactions. Holds the SAMPLEOLTP and SAMPLESTAR data. Used by the case study to hold data for user MARTY. Serves as a temporary workspace for operations that sort a large amount of data and cannot do it all in memory. rbs1dmdb.ora usr1dmdb.ora tmp1dmdb.ora
Other files required for the functioning of the DMDB database are:
I
log1dmdb.ora, log2dmdb.oraRedo log files, which hold data required to protect the database from system failures ctl1dmdb.ora, ctl2dmdb.oraControl files, which hold bookkeeping information about the status of the database and its associated files initdmdb.oraInitialization file, which records values of configurable parameters that influence the functioning of the database
Database Users
Table 22 describes the user accounts that are used in this book and that are already configured in the DMDB database. All information is provided here so that you can refer to it when you need it. The DMDB database contains other accounts, but you do not use them in the exercises in this book. At this point, you may not be familiar with all technical terms used in this table. The technical terms are explained in later sections.
Table 22
Account
system
User Accounts for the DMDB Database

Password
manager
Type
Administrator
Description
Has privileges that allow database administration tasks. Owns the data dictionary. Not used in this book.
Roles
CONNECT, RESOURCE, DBA Most available roles
sys
change_on_install
Administrator
2-5
Table 22 (Cont.) User Accounts for the DMDB Database

Account
dmadmin
Password
manager
Type
Administrator
Description
Owns Data Mart Designer and Data Mart Builder repositories. Also owns a public Discoverer End User Layer with a defined business area for Global Computing Business.
Roles
CONNECT, RESOURCE, DBA
marty
marty
User
CONNECT, Owns a schema that represents the target data mart schema as it would be RESOURCE, DBA if the exercises in Chapter 3 are completed correctly. The tables contain no data. Owns the order entry application tables that feed the data mart. Owns the installed target tables for the Global Computing case study. Used in Chapter 3 as a subordinate Data Mart Designer user. CONNECT, RESOURCE CONNECT, RESOURCE CONNECT, RESOURCE
sampleoltp samplestar yves
sampleoltp samplestar yves
User User User
As you step through the process of creating your data mart using this cookbook, you use the user IDs yves and marty. Think of the accounts and all structures created by these accounts as your play area, where you are free to experiment. Because it is quite possible that you may make a mistake or two along the way, most of the exercises requiring changes to the database are confined to this area. You will be directed to switch to the installed demo tables owned by user samplestar when you need to access a consistent set of data. Note that the user dmadmin owns the default Data Mart Builder repository. During the Data Mart Suite installation, the user dmadmin is mapped to the Data Mart Builder registered user called system, as explained in Data Mart Builder Repository. This user is not the same as the database user system.
Data Mart Builder Repository

In this release, Oracle Data Mart Builder does not use the database for user authentication. The list of registered Builder users, as well as their privileges, is maintained in the Builder repository. This repository, named DEFAULT, ships with two predefined Builder users: system (password manager) and guest (no password).
2-6
Database Structures for the Global Computing Case Study

This sections describes the database structures you will use in creating the data mart for the Global Computing case study.
Source Data
The primary source of data for the data mart is the order entry database. All data in the order entry database is owned by one user, sampleoltp, and is stored in the DMDB database. Flat files contain additional data needed for the data mart. This arrangement, unusually simple, was created for this exercise. A closer representation of reality would have meant that the tables reside on a different database on a different machine. The source data, from both the database tables and flat files, is discussed in detail in Chapter 3.
Target Tables
To represent the process of designing and creating the tables that hold the data mart information, in Chapter 3 and Chapter 4 you will design and create the tables for user yves. Because you may make some mistakes in the design and creation of the tables, the installation provides tables that are already created, but unpopulated. These tables are owned by the user marty. In Chapter 5, you will load the tables as user marty using the ETT tool. In Chapter 6, which describes how to use Oracle Discoverer, you will use tables owned by samplestar to look at a consistent set of data when analyzing the data.
Overview of the Hands-on Exercises

In the exercises in this cookbook, file specifications use the Windows NT format. If your database is on another platform, such as Sun Solaris, use the appropriate format for any database-related file specifications. Here is a quick overview of what you will learn in the hands-on exercises:
I
DesignDesign the data mart star schema from the operational sources, an OLTP schema and flat files. Construct storageLearn the fundamentals of how to create storage using the Oracle8 database server. The exercises use Oracle Enterprise Manager as the system management tool.
2-7
PopulateUse Oracle Data Mart Builder to construct data flow plans and populate the fact and dimension tables. AccessSet up Oracle Discoverer for end-user access and use Oracle Discoverer to analyze data. Create reports using Oracle Reports. ManageManage the data mart for efficient performance, back up the data to protect against system failures, and keep the data current by scheduling refreshes.
2-8
3
What Happens in This Step of the Process?
This chapter looks at the issues involved in the design of a data mart. Think of this chapter as a collection of tips on how to run your data mart implementation project. Just as important as learning what you should do is learning what to watch out forthe things that can trip you up on a project like this (and these may not always be technical issues). You build the data mart in an iterative mannerthe end users tell you what they want, you deliver some data, and the end users examine it and develop new requirements. Then, you revisit your design and the cycle begins again. This is the I know it only when I see it phenomenon. However, you have to start somewhere. The driving business factor for the data mart is the need for information, and the best way to start the design process is by identifying the business needs. You should involve those who have an investment in the data mart, such as the business sponsor and the end user, as early as possible in the design process. Together, you should agree on the information requirements that the data mart must fulfill, the data sources, the technical requirements (such as how often the data needs to be refreshed from the source), and the success criteria for the project. The steps in designing a data mart are:
1. 2. 3.
Conducting a study to define the scope of the project Defining the business and technical requirements for the data mart Developing the logical and physical design of the data mart
In the rest of this chapter, you look at the issues involved in each one of these steps. Then, you apply these general principles to the Global Computing case study to create a logical and physical data mart design.
3-1
What Is the Role of Data Mart Designer?
What Is the Role of Data Mart Designer?

In addition to a powerful graphical interface, Oracle Data Mart Designer provides a metadata repository that holds detailed design information about your databases. The repository is implemented as a set of tables in an Oracle database. The data you enter in the repository is available to any user who has at least read access to the repository application system. Thus, the metadata generated in Designer can be shared by Oracle Data Mart Builder and Oracle Discoverer. The data can also be accessed through the open repository metamodel views by any SQL reporting tool, other third-party products, or custom in-house applications. It can even be made available through a Web browser, using Oracle Web Application Server. Oracle Data Mart Designer also includes a set of standard Repository Reports that you can access from the Designer Front Panel. You can use components of Data Mart Designer to help you create the logical and physical design of the data mart:
I
Use Entity Relationship Diagrammer to create the logical design of the data mart. Use Database Design Transformer and Design Editor to create the physical design of the data mart. Use Server Generator to generate scripts that contain SQL data definition language (DDL) commands, such as CREATE TABLE. You can use these scripts in later steps to create the objects in the database.
Defining the Scope of the Data Mart Project

Before you begin to implement the data mart, you need to develop a plan for its delivery. Critical inputs to this plan are the information requirements and priorities of your users. After this information has been defined and approved by your business sponsor, you can develop your list of key deliverables and assign responsibilities to your team. Often, additional requirements are added to the project after the project has started, without much thought of the impact on resources and the scheduled delivery date. Although such small changes are usually perceived to have no impact, they can add up quickly to affect the scope of the project. This phenomenon is called scope creep. Your first task is to define the scope of the project. The scope of the data mart defines the boundaries of the project and is typically expressed in some combination of geography, organization and application, or business functions. Defining scope usually requires making compromises as you try to balance
3-2
Defining the Requirements for the Data Mart
resources (such as people, systems, and budget) with the scheduled completion date and the capabilities you promised to deliver. Defining your scope and making it clear to everyone involved is important because it:
I
Sets the right expectations Prioritizes incremental development Highlights risks and issues Allows you to estimate costs

To start the implementation of the data mart, you need to define the business and technical requirements. However, you should expect the requirements to change as users use the initial implementation and are better able to communicate their requirements. The development of the data mart is an iterative process, because the data mart evolves in response to feedback from users.
Business Requirements
The purpose of the data mart is to provide access to data that is specific to a particular department or functional area. The data should be at a meaningful level of detail for the kind of analysis that the end users want to perform and should be presented in the business terms that they understand. The expectation is that the analysis of the data in a data mart will lead to more informed business decisions. Therefore, you need to understand how the business person makes decisionswhat questions the users ask in the decision-making process, and what data is necessary to answer those questions. The best way to understand the business processes is through interviews with the business people. The requirements identified as a result of these interviews comprise the business requirements for your data mart.
General Guidelines for Business Requirements Definition

As you gather requirements for the data mart, you should focus your efforts on the information needs of a single subject area. Remember that no requirements document will be complete enough to get all information at the outset, and you need to design the data mart to accommodate changing needs. However, if you introduce new subjects or deviate from your primary theme, you will lose your focus and schedule. Even though the data mart addresses one subject area, it usually has many business users, each with different requirements and expectations. Try to identify at least one
3-3
representative from each area of the business for your interviews. For example, if you are building a marketing data mart, interview several people involved in various aspects of the marketing function (such as a marketing analyst, channel specialist, direct marketing manager, and promotion manager). Use a consistent set of questions or an interview template for each interview. The questions should focus on the users information requirements, such as content, frequency of update, priorities, and level of detail. Finally, set a definite time limit on the interview and requirements gathering phase; otherwise, it could continue indefinitely as you try to refine each requirement. You cannot collect all requirements in this time frame, but you will get enough to create a road map. Needs will change during the implementation period, and you will need a way to evaluate and accommodate requests for changes or to reject and consider them for a future phase.
Technical Requirements
You must identify the technical requirements. These specify where you get the data that feeds the data mart. The primary sources of data for data marts are the operational systems that handle the day-to-day transactional activities. Usually, these operational systems are online transaction processing (OLTP) systems. Your data mart may be fed from more than one such operational source. However, you cannot usually transfer the data from the operational system into the data mart without intermediate processing. You need to understand how clean the operational data is and how much formatting or translation is needed to integrate it with other sources. Also, you need to determine how often you must refresh or update the data. For example, if you use the data for relatively long-term planning and analytical horizons, you may need a weekly or monthly feed rather than a daily feed. Note that the frequency of update does not necessarily determine the level of detail in the data mart. You can have a monthly feed of data summarized by week. In this initial phase of data mart design, you need to identify data sources, the kind of data cleansing needed, and the frequency with which data should be refreshed.
How Do You Know If You Have Done It Right?

When you finish your interviews, you have a set of information and performance requirements that your data mart application must meet. You should be realistic and prioritize the needs and develop a list of success criteria. To prioritize your list, ask yourself these questions:
I
Is performance the primary concern?
3-4
Global Computing CompanyRequirements Definition
Are you constrained by your systems configuration? How often do you want to update or append to the data? Do the users expect the data mart to be a comprehensive source for departmental data or is the data mart limited in scope to a particular topic within that department? Is your scope consistent with IT architecture or can you develop autonomously?
Consider the answers to these questions as you develop your priorities and critical success factors. In summary, here are some guidelines to facilitate your requirements definition process:
I
Involve the end users throughout the process. Classify the requirements analysis framework: define the requirements for the business sponsor, the IT architect, the data mart developer, and the end users. Manage the expectations of the end users.

Now that you understand the general process of requirements definition for a data mart application, you are ready to apply this to the Global Computing case study.
Input from Interviews

Global Computing Company has been losing sales over the past few months. The President has requested the Sales and Marketing department to figure out what is wrong with the business and develop a plan to turn sales around. From interviews with the Marketing Manager, you have identified the following questions that need to be addressed:
I
Is our business skewed by industry? What is our customer reach? What customers have we lost? How are our customers buying from us? What are they buying? Where are our customers using the products?
3-5
Are we selling unprofitable products? What products are profitable? Should we diversify our product offerings? Should we develop different packages of products? Are our current promotions effective? Does our dollar off snow days promotion work? Is there a pattern of purchases from CPUs to accessories? Is the business still seasonal? Are catalogs still a profitable vehicle for stimulating sales? Should we increase our direct sales force? Should we form partnerships with value-added resellers (VARs)? What is the effect of the change in dollar value on our margins? What is the effect on total sales of offering products through the Internet?
Information Requirements
Based on the questions that need to be answered, you define the following information requirements for the data mart:
I
Analyze industry trends and target specific market segments Identify best and worst sellers Identify sales opportunities through custom package offerings or by capitalizing on buying trends Identify product trends and develop a channel strategy Analyze sales over multiple promotional cycles Analyze sales channels
Performance Requirements
You define the following end-user access and performance requirements:
I
Access to the data by four analysts simultaneously Data updated monthly, no later than two weeks after the end of the month
3-6
Data Mart Design
Ad hoc and customized reporting capabilities for each analyst and the ability to access reports on the intranet
Requirements Summary
The data mart for Global Computing must include data about customers, products, time, and sales channels with information about dollar sales, unit sales, and margin. The Sales and Marketing department will maintain the data mart server, and data must be accessible through a Web browser interface.
Data Mart Design

At the beginning of the design stage, business requirements are already defined, the scope of your data mart application has been agreed upon, and you have a conceptual design. Now, you need to translate your requirements into a system deliverable. In this step, you create the logical and physical design for the data mart and, in the process, define the specific data content, relationships within and between groups of data, the system environment supporting your data mart, the data transformations required, and the frequency with which data is refreshed. The logical design is more conceptual and abstract than the physical design. In the logical design, you look at the logical relationships among the objects. In the physical design, you look at the most effective way of storing and retrieving the objects. Your data mart design should be oriented toward the needs of your end users. End users typically want to perform analysis and look at aggregated data, rather than at individual transactions. Your design is driven primarily by end-user utility, but the end users may not know what they need until they see it. A well-planned design allows for growth and changes as the needs of users change and evolve. The quality of your design determines your success in meeting the initial requirements. Because you do not have the luxury of unlimited system and network resources, optimal utilization of resources is determined primarily by your design.
3-7
Data Mart Design
By beginning with the logical design, you focus on the information requirements without getting bogged down immediately with implementation detail.
Note that you are not forced to work in a top-down fashion. You can reverse-engineer an existing data schema and use this as a starting point for your design. If your data requirements are very clear and you are familiar with the source data, you might be able to begin at the physical design level and then proceed directly to the physical implementation. In practice, it takes several iterations before you achieve the right design. The Data Mart Designer tools help you keep all three levels synchronized.
3-8
Creating a Logical Design

A logical design is a conceptual, abstract design. You do not deal with the physical implementation details yet; you deal only with defining the types of information that you need. The process of logical design involves arranging data into a series of logical relationships called entities and attributes. An entity represents a chunk of information. In relational databases, an entity often maps to a table. An attribute is a component of an entity and helps define the uniqueness of the entity. In relational databases, an attribute maps to a column. You can create the logical design using a pen and paper, or you can use a design tool. In this chapter, you use the Data Mart Designer and its Entity Relationship Diagrammer. While entity-relationship diagramming has traditionally been associated with highly normalized models, such as OLTP applications, the technique is still useful in dimensional modeling. You just approach it differently. In dimensional modeling, instead of seeking to discover atomic units of information and all of the relationships between them, you try to identify which information belongs to a central fact table and which information belongs to its associated dimension tables. The Entity Relationship Diagrammer lets you mark entities as either facts or dimensions, and later the Diagrammer reuses this information to automatically propose the right kinds of indexes for the resulting fact and dimension tables. Attention to design is critical. Keep your business requirements on hand throughout the design process. Nothing else is more important! As part of the design process, you map operational data from your source into subject-oriented information in your target data mart schema. You identify business subjects or fields of data, define relationships between business subjects, and name the attributes for each subject. The elements that help you to determine the data mart schema are the model of your source data and your user requirements. Sometimes, you can get the source model from your companys enterprise data model and reverse-engineer the logical data model for the data mart from this. The physical implementation of the logical data mart model may require some changes due to your system parameterssize of machine, number of users, storage capacity, type of network, and software. You will need to make decisions as you develop the logical design:
I
Facts and dimensions Granularity of the facts
3-9
Relationship between the entities Historical duration of the data
Creating a Wish List of Data

You generate the wish list of your data elements from the business user requirements. This cookbook assumes that the scope of the data mart is fully specified by the users. Often, you must look beyond the specific requests of the users and anticipate future needs. Start with the business parameters that matter to your subject area. For a Sales and Marketing data mart, parameters might be Customer, Geography, Product, Sales, and Promotions. Remember Timedo you want to look at monthly, daily, or weekly figures? Then, create a list of desired data elements, either from the requirements provided by the users, or by brainstorming with them. At the end of this exercise, you should have the following:
I
A list of data elements, both raw and calculated Attributes of the data, such as character or numeric data types Reasonable groupings of the data, such as geographical regions for the elements country, county, city An idea of the relationship between the data, such as a city is within a county
Typical data fields of interest in the Sales and Marketing example might be dollar sales, unit sales, product names, packages, promotion characteristics, regions, and countries. Identify the critical fieldsthose that drove the creation of the data mart. Data such as dollar sales or unit sales are critical for a sales data mart. Users may provide you with reports to give you an idea of their data requirements. These reports may be existing reports or the kind of reports they would like to see. Reports are a good vehicle to get the users to articulate their needs. At this point, you can separate the data into numeric data (the facts) and textual or descriptive data (the dimensions). Consider the report shown in the following table. From this report, you see that Region, State, Fiscal Year, and Dollar Sales are important. The facts are Dollar Sales, and the dimension elements are Region, State, and Fiscal Year.
3-10
Fitzgerald Worldwide Sales Eastern Regional Sales Projections Northeast State New York New Jersey Rhode Island Connecticut Massachusetts Vermont New Hampshire Maine Total FY1996 $10,111 $22,100 $13,270 $10,800 $23,400 $11,700 $5,850 $4,095 $101,326 FY1997 $13,400 $24,050 $15,670 $21,500 $25,600 $12,285 $6,143 $4,586 $123,234 FY1998 $20,900 $27,890 $19,850 $28,970 $26,500 $12,899 $6,450 $5,504 $148,963
3/5/99
YTD1999 $12,090 $14,099 $5,671 $8,277 $7,571 $3,686 $1,843 $1,572 $54,809
During the iterative process of interaction with the end user, ask why certain data is importantwhat decisions are driven by this data? Some insight into the business processes will help you anticipate future data needs.
Identifying Sources
Now, you have a list of dimensions and facts that you want for your data mart. The question is, can you get the data? And if yes, at what price? Data sources can range from operational systems, such as order processing systems, to spreadsheets, as shown in Figure 31. You need to map the individual elements from your wish list to the sources. You should start with the largest, most comprehensive source and seek other sources as needed.
Requirements and Design 3-11
Figure 31 Sources for the Independent Sales and Marketing Data Mart
Typically, a large percentage of the data comes from one or two sources. The dimensions can usually be mapped to lookup tables in your operational system. In their raw form, the facts can be mapped to the transaction tables. For use in the data mart, the transaction data usually needs to be aggregated, based on the specified level of granularity. Granularity is the lowest level of information that the user might want. You may find that some of the requested data cannot be mapped. This usually happens when groupings in the source system are not consistent with the desired groups within the data mart. For example, in a telecommunications company, calls can be aggregated easily by area code. However, your data mart needs data by postal code. Because an area code contains multiple postal codes and one postal code may span multiple area codes, it is difficult to map these dimensions. You may find that some data is too costly to acquire. For example, the promotion data that the users requested may not be obtained easily because the information is not consistent across time or promotion. To translate to a common system format would be very costly.
3-12
Classifying Data for the Data Mart Schema

At this point, you have started thinking about the classification of your data as facts and dimensions. A common representation of facts, dimensions, and the relationships between them in data mart applications is the star schema, as shown in Figure 32. Typically, it contains a dimension of time and is optimized for access and analysis. It is called a star schema because the graphical representation looks like a star with a large fact table in the center and the smaller dimension tables arranged around it.
Figure 32 Star Schema
Advanced design modeling may involve schemas, called snowflake or constellation schemas, that are more complex than the simple star schema shown. The next sections provide more information about dimensions, facts, and level of granularity.
Dimensions
In your classification exercise, many of the fields from the OLTP source will end up as dimensions. The big design issue is to decide when a field is just another item in an existing dimension, or when it should have its own dimension. The time dimension is generated independently using the discrete dates in the OLTP source. This offers flexibility in doing any time series analysis.
For a true star schema, the creation order of the dimension tables does not matter as long as they are created before the fact table. Generally, a table must be created before it can be referenced by other tables. Therefore, be sure to create all dimension tables first.
Facts
Facts are the numeric metrics of the business. They support mathematical calculations used to report on and analyze the business. Some numeric data are dimensions in disguise, even if they seem to be facts. If you are not interested in a summarization of a particular item, the item may actually be a dimension. Database size and overall performance will improve if you categorize borderline fields as dimensions. For example, assume that you have a membership database for a health club and want to find out how much of the club brand vitamins the members buy. In your wish list, you have several queries like Give me the usage by age by. . . and Give me the average age of members by. . .. Is age a fact or a dimension? Make it a dimension.
Granularity
After you define the facts and dimensions, you determine the appropriate granularity for the data in the data mart. At this point, you know why your users have requested a particular level of information within a dimension. You need to estimate the resource requirements to provide the requested level of granularity and, based on the costs, decide whether or not you can support this level of granularity.
Designing the Star Schema

After you have a list of all facts, dimensions, and the desired level of granularity, you are ready to create the star schema. The next step is to define the relationships between the fact and dimension tables using keys. A primary key is one or more columns that make the row within a table unique. The primary key of the fact table can consist of several columns. Such a key is called a composite or concatenated key. Figure 33 shows a star schema and its primary keys.
3-14
Figure 33 Primary Keys in the Star Schema
It is a good idea to use system-generated keys (synthetic keys), in place of natural keys, to link the facts and the dimensions. This provides the data mart administrator with control of the keys within the data mart environment, even if the keys change in the operational system. A synthetic key is a generated sequence of integers. You include the synthetic keys in the dimension table, in addition to the natural key. Then, you use the synthetic key in the fact table as the column that joins the fact table to the dimension table. Although creating synthetic keys requires additional planning and work, the keys can provide benefits over natural keys:
I
Natural keys are often long character strings, such as in a product code. Because synthetic keys are integers, response time to queries is improved. The data mart administrator has control over the synthetic key. If a manufacturing group changes the product code naming conventions, the changes do not affect the structure of the data mart.
Consider using synthetic keys for most dimension tables. (In the rest of this cookbook, we refer to synthetic keys as warehouse keys.) The process of translating the data from the OLTP database and loading the target star schema requires mapping between the schemas. The mapping may require aggregations or other transforms.
Moving from Logical to Physical Design

During the physical design process, you convert the data gathered during the logical design phase into a description of the physical database, including tables and constraints. This description optimizes the placement of the physical database structures to attain the best performance. Because data mart users execute certain types of queries, you want to optimize the data mart database to perform well for those types of queries. Physical design decisions, such as the type of index or partitioning, have a huge impact on query performance. In this chapter, you use the Database Design Transformer, a Data Mart Designer tool, to create an initial physical design. Then, you use the Design Editor to refine the physical design. As the data mart becomes successful and more widely used, more and more users will access it. Over time, the volume of data will also grow. Scalability, the ability to increase the volume of data and number of users, is an important consideration when you move from your logical design to a physical representation. The following figure shows some reasons you might need the data mart to be scalable.
To accommodate the need for scalability, you should minimize the limitations of factors such as hardware capacity, software, and network bandwidths.
Estimating the Size of the Data Mart

In estimating the size of your data mart, you need to develop a method that will accommodate its future growth. There are several methods for estimating the size of the database. Here is one approach (see Estimating Size on page 3-54 for an example):
3-16
1. 2. 3. 4.
Use a representative sample of the source data to determine the number of rows in the fact table. Estimate the size of one row in the fact table. Estimate the size of the fact table by multiplying the number of rows by the size of one row. Estimate the size of the data mart. Generally, the total size of the data mart is three to five times the size of the fact table.
This process is usually iterative. Each time the design changes, you should estimate the size again. Even if you think that your star schema is small, you should do this calculation once. After you calculate the size, you can validate your assumptions by doing the following:
1. 2. 3. 4.
Extract sample files. Load data into the database. Compute exact expected row lengths. Add overhead for indexing, rollback, and temporary tablespaces, and a file system staging area for flat files.
To plan for future growth, you can use the ratio of the estimated size to the largest possible size of the fact table to calculate the future size of the data mart.
1. 2. 3. 4. 5. 6. 7.
For each dimension, check the granularity that you want and estimate the number of entries in the finest level. Multiply the number of entries of all dimensions to get the maximum possible rows. Calculate the ratio of actual rows from representative data to possible rows. Estimate the growth for each dimension table over a period of time. Multiply the number of rows of all dimension tables. Adjust the number, using the ratio calculated in Step 3. Multiply the result by the fact table row size.
You may need to schedule a regular batch job to refresh your data mart from your sources. Depending on the data volumes and system load, this job may take several hours. Plan your data mart refresh so that under normal circumstances it can be accomplished within the time allowed for batch processing, usually at night.
Global Computing CompanyData Mart Design
In your planning process, you should also estimate the data volume that will be refreshed. Develop a strategy for purging the data beyond the specified retention period.
What Is Metadata?
Metadata is information about the data. For a data mart, metadata includes:
I
A description of the data in business terms Format and definition of the data in system terms Data sources and frequency of refreshing data
The primary objective for the metadata management process is to provide a directory of technical and business views of the data mart metadata. Metadata can be categorized as technical metadata and business metadata. Technical metadata consists of metadata created during the creation of the data mart, as well as metadata to support the management of the data mart. This includes data acquisition rules, the transformation of source data into the format required by the target data mart, and schedules for backing up and refreshing data. Business metadata allows end users to understand what information is available in the data mart and how it can be accessed. You use the technical metadata to determine data extraction rules and refresh schedules for the Oracle Data Mart Builder component. Similarly, you use the business metadata to define the end-user layer used by the Oracle Discoverer query tool.

In this section, you use the Data Mart Designer to create the logical and physical design of the data mart for the Global Computing Company. You take the following steps using Data Mart Designer components:
1. 2. 3. 4.
Create the logical design for the data mart schema using the Entity Relationship Diagrammer. Move the logical design to an initial physical design using the Database Design Transformer. Refine the physical design using the Design Editor. Generate DDL scripts to create the objects for the target database.
3-18
Setting Up Data Mart Designer

Before you begin to use Designer to design your database, you need to create a Designer user and a Designer application.
Creating a Designer User

It is good practice to create a separate, subordinate user, expressly for designing a particular database, in the Designer Repository. You can create a subordinate user for Designer from any database user who has at least CONNECT and RESOURCE privileges. You grant the subordinate user object privileges on the database objects owned by the repository owner and create synonyms in the subordinate users data schema to point to these objects. You can create the user easily by using the Designer Repository Administration Utility:
1. 2.
From the Windows NT Program group, select Data Mart Designer R2.1, then Repository Administration Utility. To connect as the user who owns the repository, enter the following in the dialog box:
I
For Username, enter dmadmin. For Password, enter manager. For Connect, enter dmdb.
Click OK.
3.
In the Repository Administration Utility, click the Maintain Users icon.
4.
In the User Maintenance tree list, select Managers and click the Add button (the top button on the left with the plus sign (+)).
5.
In the Repository User Properties sheet, enter the following:

I
For Oracle User Name, select YVES. For Full User Name, enter YVES.
3-20
For Type, select MANAGER. This type gives the user the privilege to create applications in the repository.
6. 7.
Click OK. In the User Maintenance dialog box, click Reconcile. If you do not click Reconcile, you will not be able to connect to the repository. Designer grants object privileges and creates synonyms. You may see a message asking you to make sure that all subordinate users have the correct system privileges. Click OK.
8. 9.
When the process is complete, Designer displays a status box. Click OK. Click OK to close the User Maintenance box.
10. From the File menu, select Exit.
Creating a Designer Application

In Designer, the unit of work is the application. An application is a group of logically related elements, such as table definitions. To create an application, take the following steps:
1. 2.
From the Windows NT Program group, select the Data Mart Designer R2.1 group, then select Data Mart Designer. In the Connect dialog box, enter the following: For Username, enter yves. For Password, enter yves. For Connect String, enter dmdb.
3.
Click OK.
4.
In the Application System dialog box, enter DATAMART as the Application System name:
5.
Click Create. Designer creates an initial application.
6.
Select the DATAMART application from the list box and click OK.
Designer displays the Designer Front Panel.
3-22
Creating a Wish List of Data

The first step in creating the logical design is to form a wish list of data. Begin with a question and identify the elements that are necessary to answer the question. This gives you an idea of what needs to be stored in the data mart. How are our customers buying from us? One interpretation of this question may be that the user is asking about the use of channelsdirect sales, catalog, or Internet. You need to know who the customers arean account, a member of a segment. You also need an element of timeweekly or monthlyrelated to a season or quarter. Of course, you want to measure the sales, either in terms of units or dollars. You can interpret the same question differently. Maybe the user is asking about customer buying patterns. Does the customer buy at a particular time of the yearat the beginning or end of the fiscal year? Or does the customer tend to buy on promotion only? Does a specific customer group follow a specific buying pattern? To answer, you need the definition of the customer; promotion information such as dates, terms, affected products, unit and dollar sales; time such as weeks, months, and years to compare year over year; and the sales in dollars or units. As you review each question, the same information appears repeatedly. You can identify the important information entities:
I
Channel Products Time Promotions Customers Industry Sales
Then, you can identify the suitable items within each entity:
I
Channel: Direct sales, catalog, Internet Products: packages, familiessuch as CPUs or accessories Time: seasons, months, quarters, weeks Promotions Customers: Accounts
Industry (economy): Dollar value Sales: Dollars, units
Now that you have identified the entities, you need to expand each entity and think about the lowest level of information that the user might want. This important concept, called granularity, is critical in determining both size and physical design.
Adjusting Your Wish List to Reality

You have a wish list of data. Where are you going to get the data? Global Computing Company has an order entry system that uses an Oracle8 database as its repository. As you review this database, you recognize that much of the data can come from this source, shown in Figure 34.
Figure 34 OLTP Source Schema
Some of the information that you need is not in the OLTP schema. The product and channel information is in a flat file database, shown in Figure 35.
3-24
Figure 35 Data in Flat Files
You try to find a source for promotion information and discover that this kind of data has been recorded for the last three years on paper by the product manager and in no common format. You propose to the Sales and Marketing user group that promotion data be excluded from the first implementation. Considering the cost-benefit analysis of including the promotion data, the Sales and Marketing group agrees with your approach.
Determining the Granularity Level

How do you determine the appropriate level of granularity? Unfortunately, there is a price to pay for a low level of granularity. Therefore, you need to balance the business requests with your system environment to determine the ideal and manageable level of granularity. Examine each data category within Global Computing to determine the appropriate level of granularity based on the data in the OLTP database:
Data Category Channel Product Customer Time Sales Requested Granularity Channel Description Item Level Decision Maker Daily Units/Dollars Appropriate Granularity Channel Description Item Level Receiving Location Daily Units/Dollars
For the Customer category, using the decision maker, the person who makes the decision to buy a product, as the granularity level is not appropriate. The name of the decision maker is not available in the source database. For this reason, you decide that you will use the receiving location as the granularity level.
Classifying Data for the Global Computing Star Schema

As Classifying Data for the Data Mart Schema on page 3-13 explains, you need to classify your data as facts and dimensions. A sale is the core element that will be analyzed, and the descriptive data about the sale becomes the dimensions. Thus, from the categories listed on page 3-23 in Creating a Wish List of Data, you categorize Sales as the fact and Products, Customers, Channels, and Days (for Time) as the dimensions. Notice that in the star schema, you can include identifiers and attributes in one table that would be separated into multiple tables in a normalized or OLTP database. This is a very simple star schema; more complex data may require a more complex schema. The Days dimension is one of the easiest to populate. Global Computing Company measures its business by quarter and is looking for year-over-year comparisons. There is a clear relationship between year, month, and day. The identifiers are often generated or dictated by the ETT or query tool. The Days dimension typically uses a synthetic, system-generated (warehouse) key with no particular meaning as its primary key. In this table, the primary key is DATE_ID, a warehouse key. The Customers dimension is usually the most difficult dimension to determine. The combination of SHIPTO_NUM and ACCOUNT_ ID gives you information about the account and the particular location to which the order was shipped. You identified this as the lowest level of detail that you want to track. Decisions and business activities can be directed to that level. This data and its associated description have a many-to-one relationship to Account, which has a many-to-one relationship to Segment. (Segment is a synonym for industry.) In addition, you want to know the type of location: LOCTYPE_DESC. The Customers dimension can use a natural key from the source or a warehouse key. In this case, it uses a warehouse key, CUSTOMER_ID. The Channels dimension contains only an identifier and the description of the Channel. The CHANNEL_ID acts as the primary key.
3-26
The elements of the Products dimension are fairly easy to identify. PACKAGING refers to a grouping of products and is an attribute of the item. ITEM_SOURCE indicates whether or not the item was manufactured in the U.S. and is an attribute of the item alone. Items can be grouped into families, a many-to-one relationship, and families are grouped into classes. The dimension uses a warehouse key, PRODUCT_ID. The fact table, SALES, holds the metrics of the business. You need to decide which of the calculated measures of the business should be stored rather than calculated as needed. The fundamental facts that cannot be derived are UNITS, SALES, and COST. Because Global Computing Company is quite concerned about the erosion of profits, you decide to store MARGIN rather than calculate it as needed to facilitate reporting. Notice that the primary key is a composite key made up of the primary keys (natural or warehouse) of each dimension table.
Creating the Logical Design

You create the logical design in three steps:
1. 2. 3.
Sketching an initial, rough logical design based on user requirements Capturing the design of the OLTP source data Completing the logical design, using the information from the OLTP source, as well as other data sources
Sketching an Initial Logical Design

You use the Entity Relationship Diagrammer to sketch an initial, rough logical design. To invoke the Diagrammer, click Entity Relationship Diagrammer in the Designer Front Panel. First, you create a new diagram and the entities by taking these steps:
1.
From the File menu in the Entity Relationship Diagrammer, select New.
The Diagrammer displays a window with a default name, such as ERD1:
If the Tool Palette is not displayed, select Tool Palette from the View menu.
2.
Click the Entity button (the second button in the Tool Palette) and then position your cursor in the middle of the diagram window. Holding the mouse button, draw a rectangle indicating the size of the entity. When you release the mouse button, Designer displays the Create Entity dialog box. In the Create Entity dialog box, enter the following:
I
3.
For Name, enter SALE. For Short Name, enter SAL. For Plural, enter SALES. Give some thought to what you enter in the Plural field. The Database Design Transformer, which you will use in a later step, uses this field as the default name of the table.
4. 5.
Click OK. To assign some attributes to this entity, double-click the entity SALE in the diagram and select the Attributes tab from the Edit Entity dialog box. Create the following attributes:
3-28
Name UNITS SALES COST MARGIN
Seq 10 20 30 40
Opt (check box?) Yes Yes Yes Yes
Format NUMBER NUMBER NUMBER NUMBER
Seq is the ordering sequence for attributes as they are displayed in the diagram. Opt means that the attribute is optional, that is, the attribute does not always need to contain data.
6. 7.
Click OK. Designer adds the attributes to the SALE entity. Create four more entities, using the information in the following table. Remember to click the Entity button to draw each entity.
Name DAY CUSTOMER Short Name DA CUS Plural DAYS CUSTOMERS
Name PRODUCT CHANNEL
Short Name PRO CHA
Plural PRODUCTS CHANNELS
You will enter additional attributes for each dimension in a later exercise.
8.
Drag the entities to arrange them in a star shape:
You can customize the font size, fill color, and line size using the buttons in the Tool Palette.
9.
Save the diagram by selecting Save Diagram from the File menu. In the Save Diagram As box, name it GCC STAR. Click OK.
Now, you mark each entity as a fact (numeric data) or a dimension (textual or descriptive data).
1. 2. 3.
Double-click the SALE entity. In the Definition tab, set the Datawarehouse Type to Fact. Click OK. Double-click the DAY entity. In the Definition tab, set the Datawarehouse Type to Dimension. Click OK. Repeat the previous step for the CUSTOMER, PRODUCT, and CHANNEL entities, marking each as a dimension.
Next, you define the relationships between the entities. A relationship exists between the fact entity, SALE, and each dimension entity.
3-30
Any sale must refer to a channel, a customer, a product, and a day. Otherwise, the information about the sale is incomplete. On the other hand, a dimension may be referenced by a sale, but not necessarilya particular product may never be sold. Each sale is associated with only one product, but a product may be sold many times. This is a many-to-one relationship and a mandatory-to-optional relationship (the product side of the relationship is optional). To create the relationship between SALE and PRODUCT, take these steps:
1. 2. 3.
Click the M:1 (M to O) Relationship button (the one immediately to the right of the Entity button on the tool palette). Click the SALE entity, draw a line toward the PRODUCT entity, and then click the PRODUCT entity. In the Create Relationship dialog box, enter the following:
I
For From Name, enter referencing. For To Name, enter referenced by.
4.
Click OK.
The diagram looks like the following:
To read or create an entity-relationship diagram, keep in mind these points:

I
The cardinality of one or many is represented by a crows foot. The cardinality of one is represented by a dash. The mandatory characteristic is represented by a straight line. The optional characteristic is represented by a broken line. You read a diagram from the point of view of each entity: Left-to-right: A SALE must be referencing one product. Right-to-left: A PRODUCT may be referenced by one or many SALES.
When you draw a relationship, always begin with the many end of the relationship. You can always modify the relationships cardinality, optionality, and name by double-clicking the line representing the relationship. Now, create a relationship between SALES and each of the other dimensions:
1.
Repeat the previous steps for the CHANNEL, CUSTOMER, and DAY dimensions. All relationships are M:1 (M to O). Remember to click the relationship button and start with the SALE entity each time. Drag the entities to make the diagram easier to read. The diagram now looks like the following:
2.
3.
At this point, you have drawn a rough sketch of your logical design. You have defined the entities and some of the attributes of the entities. Save the diagram by selecting Save Diagram from the File menu. Exit the Entity Relationship Diagrammer.
You do not use foreign key attributes in logical data modeling because the relationship already signifies that the two entities are connected. You consider foreign keys during the physical design. Furthermore, when you use the Database Design Transformer to convert your logical data model to an initial physical design, the Transformer automatically creates foreign key columns and foreign key constraints based on the relationships. It creates the foreign key column in the entity that has the many side of the
3-32
relationship. If you define foreign key columns, the Transformer generates redundant columns. In the next section, you take a closer look at your data in the source OLTP system.
Capturing the Design of the Source OLTP System

You know that much of the data that you need for your data mart currently exists in an OLTP system. You want to look at the design of that system and capture the metadata so that you can use it. In this section, you reverse-engineer the source OLTP system to capture this metadata. Before you begin, it is wise to create a new application specifically for this purpose. Otherwise, you mix source and target table definitions in the same application, which can be confusing and can lead to naming conflicts. Because you are already connected to the Data Mart Designer and the Designer Front Panel is running, you can create an application using the Repository Object Navigator. Take these steps:
1. 2. 3. 4. 5. 6. 7.
In the Designer Front Panel, click Repository Object Navigator. From the File menu, select New Application. In the Navigator tree, name the application OLTP SOURCE. Save the application, by clicking the Save button (a diskette) to the left of the Navigator tree. From the File menu, select Exit to exit the Repository Object Navigator. Change to the new application by selecting Change Application System from the File menu of the Designer Front Panel. In the Application System dialog box, select OLTP SOURCE and click OK.
To capture the design of the source OLTP system, take the following steps:
1.
In the Designer Front Panel, click Design Editor. The Design Editor is largely wizard-driven and guides you through the process.
2.
In the Welcome box, choose Server Model as the starting point, then click OK.
3.
From the Server Model Guide Wizard, choose Run the Design Capture.
The wizard prompts you to create a database user in the repository. Note that you are not creating a user in the database, just in the repository. You are merely trying to mirror, as faithfully as possible, the reality of the source database. In the source database, a database user, sampleoltp, owns the tables and other database objects that make up the schema. You need to create a definition of this user in the repository.
4.
In the Create Database User dialog box, for Database, select DEFAULT_ DATABASE.
3-34
You can create multiple databases in the repository. You might need to do this if your design includes replication among multiple Oracle instances. You can create a database definition in the Server Model Navigator DB Admin tab. For this exercise, however, accept the default.
5.
Set the Database Type to Oracle 8 and the User Name to yves.
6. 7.
Click OK. In the Source tab, connect to the database as the user who owns the OLTP database objects, by entering the following:
I
For Username, enter sampleoltp. For Password, enter sampleoltp. For Connect, enter dmdb.
8.
Select the Objects tab. In the background, the Design Capture tool connects to the database and reads the database object definitions from the database data dictionary. In the Dont Capture window, expand Relational Tables to see a listing of tables owned by sampleoltp. the Capture window by clicking the single arrow button.
9.
10. Select all of the tables by using the Control key with the mouse. Move them to
11. Click Start to begin the design capture process. A message window appears in
the Design Editor, providing the status of the operation. You can ignore any warnings about tablespaces.
12. When the process is complete, click Save.
The Design Editor writes information to the repository. Then, it displays the reverse-engineered table definitions. This diagram corresponds to the physical design level. (You may have to expand the window or move other windows out of the way.)
3-36
13. Save the diagram by selecting Save Diagram from the File menu and entering
OLTP SOURCE DATA SCHEMA as the Diagram Name. Note that the diagram must be selected to perform this task. Take some time to explore the data. Right-click a table or column and then select Properties to see information about the items. When you are finished, exit from the Design Editor. Table 31 is a summary of the data in the OLTP source.
Table 31
Field Name Table: ACCOUNT ACCOUNT_ID AC_NAME AC_ADDRESS NUMBER VARCHAR2(35) VARCHAR2(20) Key field: Account number Customer name Customer street address 23 Bavarian Industries 123 Main St.
Global Computing Company: Description of OLTP Schema

Data Type Definition Examples Related to
Table 31 (Cont.) Global Computing Company: Description of OLTP Schema

Field Name AC_CITY AC_STATE AC_COUNTY AC_POST_CODE AC_TAX_RATE SEGMENT_ID Table: ORDER_LINE ORDER_ID ORDER_LINE_NUM ITEM_ID UNITS SALES COST Table: ORDER_HEADER ORDER_ID NUMBER Key field: Order number 34590 Foreign key for ORDER_LINE: ORDER_ID NUMBER NUMBER VARCHAR2(11) NUMBER NUMBER NUMBER Key field: Order number Line number on the order form Product identifier Number of units purchased Total sales Total cost 21000 11300 9700 8 8.4 3.6 Data Type VARCHAR2(10) VARCHAR2(2) VARCHAR2(10) NUMBER NUMBER NUMBER Definition Customer city Customer state Customer county Customer postal code Customer tax rate Industry identifier Examples Nashua NH Hillsboro 03062 0.06 4 Related to
CHANNEL_ID ACCOUNT_ID SHIPTO_NUM
NUMBER NUMBER NUMBER
Channel identifier Account identifier Ship to location
2 23 145 Foreign key for SHIPTO: SHIPTO_ NUM
ORDERED_DATE SALES_PERSON_ID PAYMENT_ID TOTAL_SALES TAX_AMOUNT Table: SHIPTO ACCOUNT_ID
DATE NUMBER NUMBER NUMBER NUMBER
Date of order Salesperson Method of payment Total sales for the order Total tax amount
3/15/95 889 4 89 4.5
NUMBER
Key field: Account number
23
Foreign key for ACCOUNT: ACCOUNT_ID
SHIPTO_NUM WAREHOUSE_ID
NUMBER NUMBER
Key field: Ship to number Warehouse identifier
45 345
3-38
Table 31 (Cont.) Global Computing Company: Description of OLTP Schema

Field Name SH_ADDRESS SH_CITY SH_STATE SH_COUNTY SH_POSTAL_CODE SH_TAX_RATE SH_DESCRIPTION SH_LOCTYPE_DESC Table: PAYMENT_CODE PAYMENT_ID PAYMENT_DESC Table: SALESPERSON SALESPERSON_ID SALESPERSON_NAME MANAGER_ID Table: SEGMENTS SEGMENT_ID SEGMENT_DESC NUMBER VARCHAR2(20) Key field: Segment number Description of segment 23 NUMBER VARCHAR2(20) NUMBER Key field: Salesperson Salesperson name Managers identifier 23 Greg White 45 NUMBER VARCHAR2(20) Key field: Payment code Description of payment method 23 Data Type VARCHAR2(20) VARCHAR2(10) VARCHAR2(2) VARCHAR2(10) NUMBER NUMBER VARCHAR2(35) VARCHAR2(30) Definition Address of warehouse City of warehouse State County Ship location postal code Ship location tax rate Description of ship location Description of location type Examples 244 Mill St. Nashua NH Hillsboro 03062 0.06 RAF Blygh Plant Related to
Not all of the data is available in the OLTP system. Some of the data, such as the product data and channel data, is available only in flat files, and is shown in Table 32.
Table 32
Field Name Table: CHANNEL CHANNEL_ID CHANNEL_DESC Table: PRODUCT ITEM_ID PACKAGING ITEM_DESC VARCHAR2(11) VARCHAR2(20) VARCHAR2(35) Key field: Product ID Package identifier Product description L14-3467-EE Executive Sentinel Standard Attribute of item NUMBER VARCHAR2(20) Key field: Channel ID Description of channel 2 Internet
Global Computing Company: Description of Flat Files

Data Type Definition Examples Related to
Table 32 (Cont.) Global Computing Company: Description of Flat Files

Field Name ITEM_SOURCE FAMILY_ID FAMILY_DESC CLASS_ID CLASS_DESC Data Type VARCHAR2(30) NUMBER VARCHAR2(20) NUMBER VARCHAR2(20) Definition Domestic or foreign Product family identifier Class description Product class identifier Class description Examples D or F 45 Portable PCs 23 Hardware Grouping of product families Grouping of items; subset of class Related to
Creating a New Version

It is good design practice to freeze the design at every major milestone so that you can move back to an earlier version of the design, if necessary. You use the Repository Object Navigator to do this:
1. 2. 3.
In the Designer Front Panel, click Repository Object Navigator. From the Application menu, select New Version. Select the DATAMART application and click New Version.
This utility creates a new version of the application. The new version is an identical copy of the previous version and has the same name, but its version number is incremented to 2. Designer freezes the previous version, version 1, and sets it to read-only.
3-40
4. 5. 6.
Close the dialog box and exit the Repository Object Navigator. In the Designer Front Panel, from the File menu, select Change Application System. Select DATAMART, version 2 to change the default application to the new version and click OK.
To revert to an earlier version, unfreeze the versions by selecting Freeze/Unfreeze from the Application menu of the Repository Object Navigator. Then, you can revert to a previous version of the application.
Completing the Logical Design

In Sketching an Initial Logical Design on page 3-27, you developed an initial design. Now, you fill in the rest of the logical design, which means that you define the following:
I
Unique identifiers for each dimension. In the physical design process, the unique identifiers are mapped to primary keys. Unique identifiers for the fact table. Additional attributes for each dimension table.
Note that in a star schema, you can include identifiers and attributes in one table that, in a normalized OLTP database, would be separated into multiple tables. As Designing the Star Schema on page 3-14 explained, you should consider using warehouse keys as the unique identifiers in dimension tables. For the Global Computing star schema, you use warehouse keys for all dimension tables except CHANNELS. In the following exercise, you create unique identifiers for each of the dimension tables, using the Entity Relationship Diagrammer.
1. 2. 3.
In the Designer Front Panel, select Entity Relationship Diagrammer. From the File menu, select Open, then select the diagram you created earlier, GCC STAR, and click OK. Double-click the CHANNEL entity, select the Attributes tab, and create an attribute using the following information:
I
For Name, enter CHANNEL ID. For Seq, enter 10. For Opt, clear the box.
For Format, enter NUMBER. For MaxLen, enter 3. For Primary, check the box.
4.
Click OK. The entity looks like the following:
In an entity diagram, the symbols have the following meanings:

I
A number sign (#) means that the attribute is a primary (unique) identifier. The letter o means that the attribute is optional. An asterisk (*) means that the attribute is mandatory.
In the Entity Relationship Diagrammer, entity and attribute names can contain spaces. Later, when you use the Database Design Transformer to transform the logical design into the physical design, the Transformer automatically converts spaces into underscores.
3-42
5.
Repeat steps 3 and 4 to create identifiers for the remaining entities, using the following information:
Opt (checked? No No No Primary MaxLen (checked?) 3 3 3 Yes Yes Yes
Entity PRODUCT DAY CUSTOMER
Name PRODUCT ID DATE ID CUSTOMER ID
Seq 10 10 10
Format NUMBER NUMBER NUMBER
The diagram now looks like this:
Now, you create identifiers for the fact table, SALE. This step is a little more complex because each sale is associated with a specific product, sold to a specific customer through a specific channel on a specific day. It is this combination that is unique and all four elements must be included in the unique identifier. In the following exercise, you mark each relationship as part of the unique identifier for the SALE entity:
1.
Double-click the relationship line connecting SALE and CHANNEL.
2.
In the Definition tab, check the Primary UID (unique identifier) box for the entity SALE. Click OK.
Now the relationship is displayed with a vertical bar near the SALE entity, which is the Many end of the relationship. The vertical bar signifies that the relationship is part of the unique identifier for the SALE entity.
3. 4.
Repeat steps 1 and 2 for the relationships between the SALES fact table and the remaining dimension tables. From the File menu, select Save Diagram.
Adding Attributes
At this point, you create the rest of the attributes for each entity. To do so, double-click an entity, select the Attributes tab, and create the attributes listed in the following table:
Entity CHANNEL CUSTOMER Name CHANNEL DESC ACCOUNT ID SHIPTO NUM SHIP TO DESC LOCTYPE DESC ACCOUNT DESC SEGMENT ID SEGMENT DESC PRODUCT ITEM ID ITEM DESC Seq 20 20 30 40 50 60 70 80 20 30 Opt Yes No No Yes Yes Yes Yes Yes No Yes Format VARCHAR2 NUMBER NUMBER VARCHAR2 VARCHAR2 VARCHAR2 NUMBER VARCHAR2 VARCHAR2 VARCHAR2 35 30 35 3 15 11 35 MaxLen 20 3 Primary No No No No No No No No No No
3-44
Entity
Name ITEM SOURCE PACKAGING FAMILY ID FAMILY DESC CLASS ID CLASS DESC
Seq 40 50 60 70 80 90 20 30 40 50 60 70 80 90
Opt Yes Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes Yes Yes
Format VARCHAR2 VARCHAR2 NUMBER VARCHAR2 NUMBER VARCHAR2 DATE NUMBER NUMBER NUMBER NUMBER VARCHAR2 NUMBER NUMBER
MaxLen 30 20 3 20 3 20
Primary No No No No No No No No No No No
DAY
DATE DESC DAY OF MONTH DAY OF YEAR WEEK OF YEAR MONTH NUMBER MONTH DESC QUARTER NUMBER YEAR NUMBER
No No No
The final version of the logical design for your data mart schema looks like this:
Before you move to the physical design phase, take these precautions:
1. 2.
Remember to save your diagram! From the File menu, select Save Diagram. Freeze the version of your application, using the Repository Object Navigator. If you do not remember how to do that, see the directions in Creating a New Version on page 3-40. From the File menu in the Designer Front Panel, select Change Application System. Select DATAMART, version 3 to change the default application to the new version and click OK.
3.
Creating the Physical Design

Now that you have a good logical design, you need to create the physical design. Remember that the physical design process does not involve the physical implementation of the database. It involves deciding the characteristics of database objects, such as tables, columns, and indexes. Creating a physical design involves the following steps:
1. 2.
Transforming the logical design into a default physical design Refining the physical design
Transforming the Logical Design into a Physical Design

The Database Design Transformer is a sophisticated tool that reads your logical design and, based on that, generates a default physical design. The Database Design Transformer can:
I
Create table definitions based on entities Create column definitions based on attributes Create primary key constraint definitions based on unique identifiers Create foreign key constraint definitions based on relationships
The Database Design Transformer uses built-in rules to create the new elements. You encountered an example of such a rule earlierthe Transformer uses the plural name of an entity as the name of the corresponding table it creates. You can change many settings, but this chapter uses the default settings with one exceptionautomatic prefix generation. You do not want Designer to generate prefixes for the names of foreign key columns. Change this setting:
3-46
1. 2. 3.
In the Designer Front Panel, click Database Design Transformer. Click Settings. Select the Other Settings tab. Clear the check box for Foreign key columns. Click OK and then click OK on the informational message box.
To begin generating the default physical design, take these steps:

1. 2. 3.
In the Database Design Transformer Mode tab, select Run the Transformer in Default Mode. Click Run. The Database Design Transformer shows the progress in an output window. When the process is finished, close the output window, and exit the Database Design Transformer by clicking Close.
Displaying the Default Physical Design

You can see the default physical design generated by the Database Design Transformer by using the Design Editor:
1. 2.
In the Designer Front Panel, click Design Editor. In the opening dialog box, choose Server Model and click OK. In the Navigator window, choose the Server Model tab and expand the Relational Table Definitions folder.
3.
Select all of the tables and drag them into the free space in the Design Editor. Close the Server Model Guide to clear some space. The Design Editor displays the default database design. You can move the entities to make the diagram easier to read.
4.
From the File menu, select Save Diagram. Name the diagram GCC STAR DATA SCHEMA.
In the Server Model diagram, you can edit table and constraint definitions in much the same way that you edited entities and relationships in the Entity Relationship Diagrammerby double-clicking the element. Designer displays a property sheet for the element. Much like the Entity Relationship Diagrammer, the Design Editor is context-sensitive when you click on an element. For example, if you double-click a column name, the Edit Table property sheet pops up, the Columns tab is selected, and the column is highlighted. You can change the style of the property sheets by choosing Use Property Dialogs (novice to intermediate) or Use Property Palette (expert) from the Options menu. For the exercise in this section, use the default Use Property Dialogs.
3-48
In addition, you can control the amount of information displayed in the diagram by toggling the buttons at the top of each table diagram, shown in this diagram:
For example, to see the names of the foreign keys for the SALES table, click the Foreign keys button in the SALES table in the diagram. Designer displays the foreign keys below the columns:
To see information about a foreign key, double-click the key. The Design Editor displays the Edit Foreign Key property sheet:
Alternatively, you can make changes to your design in the Navigator window. Click the Create icon to create new elements; click with the right mouse button on an element and choose Properties to bring up the property sheet of the element.
Adding a Staging Table

As you will learn in Chapter 5, you can use a staging table into which you initially load the data. After you perform some transformations on the data, you load it into the fact table. Using a staging table in this way can lessen the load on the source OLTP database. For example, in the Global Computing case study, the fact table uses warehouse (synthetic) keys; the staging table uses natural keys. The warehouse keys are generated during the transformation process in Chapter 5. In this section, you create a staging table called STAGE1. Because the table is very similar to the fact table, you can copy the SALES table and then make modifications. Take these steps:
1. 2. 3. 4.
In the Server Model Navigator window of the Design Editor, expand Relational Table Definitions and right-click the SALES table. From the pop-up menu, choose Copy Object. In the Copy Objects dialog box, select the Context List tab. Enter S1 for the New Short Name and STAGE1 for the New Name.
3-50
5.
Copy only the column definitions of the table, not its constraint or index definitions. Select the Copy Rules tab and clear the check boxes for all objects except columns.
6. 7.
Click Copy. When the process completes, exit from the dialog box by clicking Close. In the Server Model Navigator window, select Relational Table Definitions and choose Requery Selection from the Edit menu. The new table definition STAGE1 appears in the tree.
8. 9.
Drag STAGE1 into the free space in the diagram. Double-click the STAGE1 table in the diagram. In the Columns tab, delete the following columns, which are warehouse key columns:
I
CUSTOMER_ID DATE_ID PRODUCT_ID
10. For the CHANNEL_ID column, clear the Mandatory check box.
11. Create the other key columns, which are natural keys. To create each column,
click Add. Enter the name of the column in the Column name box and enter the remaining fields using the information in the following table:
Column Name ACCOUNT_ID SHIPTO_NUM ITEM_ID ORDERED_DATE 12. Click Finish. 13. From the File menu, select Save Diagram. Data Type NUMBER NUMBER VARCHAR2 DATE 11 N/A N/A N/A Length 3 No. of Decimal Places 0
Your physical design is complete. Usually, the next step is to implement the physical database and the objects in it. However, because Data Mart Designer can generate the data definition language script you will need during the physical implementation, take advantage of that capability before you move on.
Generating the DDL Script

You can use the Designer component Server Generator to generate a data definition language (DDL) script. You can use this script to create the objects in your database.
1. 2. 3.
From the Generate menu in the Design Editor, select Generate Database from Server Model. In the dialog box, choose the Database radio button and specify Username yves and Password yves. For Connect, enter dmdb. For File Prefix, enter yves. For Directory, enter a file specification using the following format:
<oracle_home>\datamart
4.
Select the Objects tab. In the Dont Generate window pane, expand Relational Tables and move all of the table definitions to the Generate window pane.
3-52
The table STAGE1 is already in the Generate window pane.
5.
Click Start. When the Generator finishes, it presents a dialog box with the following options:
I
View DDL lets you view the generated DDL commands in the Notepad text editor. View Report generates a short report that compares the database object definitions in the repository with the real database objects that the user owns. This utility is particularly useful when you are generating DDL for a user who already owns objects in the database. Execute DDL creates the database objects in the users schema.
6. 7. 8.
Click View DDL. The Generator created four files: yves.sql, yves.tab, yves.con, and yves.ind. When you have finished reviewing the DDL, close the Notepad and click Cancel in the dialog box. Click Execute DDL. This will execute the DDL to create the tables and indexes.
Estimating Size
Using the methods suggested on page 3-16 in Estimating the Size of the Data Mart, estimate the size of the data mart:
1.
Use a representative sample of the source data to determine the number of rows that will be loaded into the fact table. To estimate the number of rows in the SALES fact table, count the number of rows in the ORDER_LINE table of the source database. There are 110,166 rows, meaning that there were that many orders in the four-year period. Estimate the size of one row of the fact table. For this exercise, assume that one row equals 52 bytes, calculated as 12+40 = 52. For information on calculating row size, see the Oracle8 Administrators Guide. Multiply the number of rows by the size of one row:
110,166 * 52 = 5,728,632 bytes
2.
3.
This is the estimated size of the fact table.

4.
Estimate the size of the data mart. Generally, a data mart is three to five times the size of the fact table. For this example, multiply the fact table size by four:
5,728,632 * 4 = 22,914,528 bytes
Converted to megabytes, the estimated size of the database is 21.85 MB. To plan for future growth, you can use a ratio of the estimated size to largest possible size of the fact table:
1.
Estimate the current entries in the finest level of granularity within each dimension:
Dimension Channel Customer Product Day Estimated Number of Entries 3 channels 61 shipto locations 36 items 1461
2.
Multiply the number of rows for each dimension.

3 * 61 * 36 * 1461 = 9,625,068 maximum possible rows
3-54
This is the number of possible rows in the fact table, assuming that each customer buys each product each day, using each channel. Of course, this is improbable. You must estimate the sparsity of the table.
3.
To estimate the sparsity of the table, calculate the ratio of actual rows from representative data (110,166) to possible rows (9,625,068):
110,166 / 9,625,068 = .01145
4.
Estimate the growth for each dimension table in the next two years. For example, assume that the number of channels and days stay the same, but the number of customers increase to 80 and the number of products increase to 40. Multiply the number of entries for each dimension.
(3 * 80 * 40 * 1461) = 14,025,600 entries
5.
6.
Adjust the number, using the ratio calculated in Step 3:

14,025,600 * .01145 = 1,160,593.12
7.
Multiply the result by the size of the fact table row (52 bytes for this example).
1,160,593.12 * 52 = 8,350,842.24 bytes
The estimated size of the fact table in two years is 7.96 MB (8,350,842.24 bytes); the size of the data mart is four times that, 31.85 MB. After you have created the extract routines for the OLTP source, you can recalculate your estimate using the second approach to sizing.
Translating the Physical Design to a Physical Implementation

Now that you have your physical design and the sizing calculations in place, you are ready to translate this paper design to a physical implementation. You will see how to manage the physical aspects of data mart implementation in Chapter 4. Then, you will populate the star schema after extracting the data from the OLTP source and transforming it as required. This is the focus of Chapter 5.
3-56
4
Construct
At this point, you have designed your data mart and determined the data that will be loaded into the tables in the data mart. Now, lets take a look at how you use the Oracle8 database server to construct storage for your data, organize data for fast access, and protect your data.
You will perform several of these operations using Oracle Enterprise Manager, a graphical administration tool that is part of Oracle Data Mart Suite. You can also perform all of these operations by issuing SQL commands in a utility called Server Manager.
Construct
4-1
What Is the Role of Oracle8 in the Data Mart?

The data mart is a specific application running on the Oracle8 database server. You store the data for the data mart in an Oracle8 database, which lets multiple users access the data quickly and efficiently and protects the data against system failure. The rest of this section reviews some of the general functions of the Oracle8 database server that apply to the data mart.
Storage Management
The data in a relational database like Oracle8 is stored physically on disk as operating system files, but the physical storage is organized internally into different logical structures. Oracle8 provides a way to create these structures and manage them as data is added, changed, or deleted. As the data mart grows, you will need to add space to store the additional data; you may want to add new tables or other database objects; you may want to get rid of some existing objects and use the space they occupied. Oracle8 provides a way to do all of this, and more.
Fast Access
You expect your users to issue queries frequently to access the data stored in the data mart. Naturally, your users want their queries to return answers as fast as possible. Oracle8 provides mechanisms to process queries very quickly and efficiently. The section Understanding Oracle8 Database Server: The Building Blocks on page 4-4 examines some of the mechanisms that enable fast processing of queries. Some of those mechanisms are parallel query, cost-based optimizer, star query processing, and index structures.
Data ProtectionBackup and Recovery

Lets say you are making changes to the data stored in your data mart. These changes will be made permanent when the changed data is moved to disk by Oracle8. However, your system may crash before all changed data has been stored safely. Worse, you may lose a few disks that hold several days worth of work. How do you recover from power failure or other system malfunctions? Oracle8 maintains a record of all changes so that these changes can be recovered even if the computer crashes while changes are still in memory and have not been written to permanent storage. Oracle8 also provides a way to back up data on disk to a safe location and restore data from these backups if the disk fails. Such backup
4-2
mechanisms do not require you to shut off access to usersthe database can be up and running while you go about the business of protecting your data.
Access by Multiple Users

If only one user is allowed to access and modify data, that user can change data without concern for other users modifying the same data. However, in real life, many users running multiple applications at the same time can update the same data. Therefore, some mechanism needs to ensure that many users can access data at the same time and yet see a consistent view of the data. A consistent view of data in a multiuser environment means that every user sees the changes that user makes as well as changes made by other usersno changes are lost or overwritten. Oracle8 provides a mechanism to allow concurrency or simultaneous access by multiple users while still maintaining integrity of data. Primarily, Oracle8 uses locks on data to prevent destructive interaction between users accessing the same data. Thus, if one user wants to change a piece of data, the data is automatically locked until the change is completed and made permanent (committed). Any user attempting to change the same data must wait until the first user releases the lock. The second user can, in turn, lock the data. What happens if you read data that is in the process of being changed? Do you need to wait as well? If you are merely reading the data, you do not wait. Oracle8 provides a read-consistency mechanism to make sure that you see data as it existed before the change began. Thus, you do not see data that is currently being modified and that is, therefore, inconsistent or dirty.
Database Security
Database security involves allowing or not allowing users to perform actions on the database and objects within it. Through the use of privileges, Oracle8 provides a way to regulate all access by users to all objects. A privilege is explicit permission to access a certain object or execute a certain type of SQL statement. Some examples of privileges are:
I
Permission to select rows from another users table The right to create a table The right to connect to a database
Such privileges are granted to users at the discretion of the other users who own the objects or by the database administrator.
Construct
4-3
Understanding Oracle8 Database Server: The Building Blocks

The following sections provide an overview of the building blocks for the Oracle8 database server.
Oracle Processes
Some Oracle processes execute constantly in the background to ensure smooth running of the Oracle8 database. These processes exist all of the time that the Oracle8 database is running. Other Oracle processes are created and destroyed as needed. For example, when you try to access the Oracle8 database to perform a query, the work is done by a server process that is created to execute the actions you specify.
Oracle Memory Structures

All Oracle processes communicate information to each other by filling in structures that sit in an area of shared memory called shared global area (SGA). Think of the SGA as the message board in your kitchen where you leave messages for other people in your house. The SGA is the message board for all of these processes to communicate with each other. You specify the amount of memory to allocate to the SGA in the database parameter file, init.ora.
Oracle Instances
The term instance refers to background processes and the shared memory structures of the Oracle8 database server. An instance is associated with a specific set of datafiles or database. You will encounter this term often as in starting up an instance and shutting down an instance. Starting up an instance is a command to the Oracle8 database server to allocate the required memory structures according to the specifications in the init.ora file, and start up the background processes so that you can access the database associated with the instance. Shutting it down is just the oppositethe processes are terminated and the shared memory is deallocated so that the database can no longer be accessed.
Client/Server Architecture
Computer networking has become more and more prevalent in current computing environments, and software must be able to take advantage of the distributed processing capabilities this provides. Distributed processing means that a set of related jobs is divided among several computers, instead of using one computer to
4-4
run them all. This reduces the processing load on any single computer, improving the performance of the system as a whole. Oracle8 is designed to take advantage of distributed processing by using client/server architecture. In this architecture, the database system is divided into two parts: a front-end or client portion and a back-end or server portion.
I
Client: The client is a database application, such as SQL*Plus or Oracle Discoverer, that interacts with the user through the keyboard, display devices, and pointing devices such as a mouse. The client concentrates on requesting and presenting data, which is retrieved from the database by the server. The client usually runs on a workstation or personal computer, which can be optimized for its job. For example, the workstation might not need large disk capacity, or it might benefit from graphic capabilities. Server: The server runs the Oracle8 software and handles all functions required for data access. The server receives and processes SQL statements originating from client applications. The computer that manages the server can be optimized for its duties. For example, it can have large disk capacity and fast processors.
Oracle Files
Oracle8 stores data physically in operating system files called datafiles. In addition, every Oracle8 database has some disk files that serve different purposes. Lets take a quick look at what these are:
I
Redo logs: Oracle8 provides a mechanism to record all changes made to data so that work is never lost, even if the system fails. These changes are recorded in disk files called redo logs. Every Oracle8 database must have at least two redo logs. These files are used in a cyclical wayfor example, if your database has two redo logs and the first is filled, Oracle8 starts writing information into the second redo log. When this fills up, Oracle8 switches over to the first redo log and starts reusing the log. Control files: Oracle8 records information, such as the name of the database and the size and location of the datafiles and redo logs, in the control file. You can think of the control file as the place where Oracle8 stores some important bookkeeping information about the database. The Oracle8 database cannot function properly if the control file is somehow damaged. To minimize the chances of this, Oracle8 lets you specify that you want to maintain multiple copies of the control file in more than one physical location.
Construct
4-5
Parameter file: Oracle8 reads the parameter file (init.ora) to determine the size of the shared memory area. Oracle8 obtains other information from it, such as: The name of the database The names and locations of the control files What to do with filled online redo logsshould a copy be made before the log is overwritten?
4-6
Putting the Building Blocks Together

You have taken a quick look at the building blocks of the Oracle8 instancedisk files, processes, memory structures. So how does it all fit together? The following figure represents the relationship between these different pieces:
How Oracle8 Organizes and Manages Storage

The following sections describe how Oracle8 organizes the storage of data.
Construct
4-7
Tablespaces, Datafiles, and Data Blocks

Oracle8 stores data in operating system files, referred to as datafiles, but imposes a logical structure on top of the physical storage. The logical unit of storage used by Oracle8 is called a tablespace. Each tablespace is made up of one or more datafiles. Within tablespaces, space is divided into logical chunks called data blocks. Tablespaces can be of different sizes depending on the total space in the datafiles comprising each tablespace, but for any one database, all data blocks are a fixed size that is determined when the database is created.
This logical structure is imposed by Oracle8 for important reasons from the point of view of a database administrator. The usefulness of organizing physical space as tablespaces becomes obvious when you look at the operations you can perform using tablespaces:
I
Allocating control disk spaceEach tablespace corresponds to a particular set of operating system files. By specifying which tablespace an object should be created in, you can group related objects in one set of datafiles. Lets look at an example of why this might be useful. In this example, you have two users, JOE and BOB, who tend to run resource-intensive queries at the same time. You need a way of separating their data so both are not accessing the same set of disks and running the risk of overloading them. To separate the data, you could: Create two separate tablespaces, JOE_DATA and BOB_DATA, on separate physical disks. When creating the objects for user JOE, specify that they should go into JOE_DATA; similarly, specify that all objects for user BOB should go into tablespace BOB_DATA.
Adding more space to a databaseAs data is added to the database, you need to allocate more disk space to the database. However, you also want to control the operating system files from which this space is allocated. You can add space to a database in the following ways:
4-8
By creating a tablespace. You can enlarge the database by creating a new tablespace defined by one or more files. The following figure shows how you add 100 MB to the database by creating the tablespace USERS defined by a datafile of size 100 MB.
By adding another datafile to an existing tablespace. You can enlarge a tablespace by adding a datafile of a specified size. The following figure illustrates enlarging the tablespace SYSTEM by adding two datafiles:
Construct
4-9
By changing the size of a datafile or allowing a datafile in an existing tablespace to grow dynamically in response to the need for added space. To do this, alter existing files to have dynamic extension properties.
Limiting resource usageTablespaces let you specify quotas, the maximum amount of space that each user can take up. Controlling storage characteristicsYou can specify storage settings for objects in a tablespace, such as how much you want them to grow every time additional space is needed and the maximum size. Controlling availability of logical groups of dataYou can choose to make all data in a tablespace unavailable by issuing a command to take a tablespace offline, or make it available again by making the tablespace online. Removing data that is no longer neededIf you decide that you no longer need the objects in a particular tablespace, you can drop the tablespace to get rid of these objects and reclaim the disk storage used by the tablespace.
These are not the only operations that you can perform on tablespaces. You can back up and recover tablespaces and make tablespaces read-only so that you can query objects in them but cannot modify data. Chapter 8 discusses some of these operations.
4-10
Segments and Extents and How They Relate to Tablespaces

Previous sections discussed managing space using tablespaces. You can think of tablespaces as pools of available space organized into smaller units called data blocks. When you create an object in a tablespace, Oracle8 allocates space from this pool to the object in chunks called extents, which consist of contiguous data blocks. You can control the size of the extent by setting the storage parameters for the table. Adding Space to the Database on page 4-30 describes how to control the extent. A segment is a set of extents that have been allocated for a specific type of data structure. For example, data for each table is stored in its own data segment, while data for each index is stored in its own index segment. If the table or index is partitioned, each partition is stored in its own segment. Oracle8 allocates space for segments in units of one extent. When the existing extents of a segment are full, Oracle8 allocates another extent for that segment. Because extents are allocated as needed, the extents of a segment may or may not be contiguous on disk. A segment (and all its extents) are stored in one tablespace. Within a tablespace, a segment can span datafiles (have extents with data from more than one file). However, each extent can contain data from only one datafile. When an object needs to expand because more data is being added to it, Oracle8 adds as much space as in one extent. In the example of the table JOE_TABLE in tablespace JOE_DATA, the table initially contains 1000 rows of data contained in one extent of size 1 MB. What happens if another 1000 rows are inserted into the table and it needs to grow? Oracle8 looks at the extent already allocated to JOE_ TABLE and checks if there is room left to insert the 1000 new rows. If not, Oracle8 allocates space for one additional extent to the table.
Schema Objects in Oracle8

The following sections describe some of the schema objects that you use to create a data mart database.
Tables Revisited
At this point, you probably have a good grasp of the concept of a table in a relational system. Briefly, a table is the basic unit of data in an Oracle database. The tables of a database hold all of the user-accessible data. Table data is stored in rows and columns. Every table is defined with a table name and set of columns.
Construct
4-11
IndexesBitmap and B-Tree

Indexes are optional structures associated with tables and clusters. You create an index on a table to speed up the execution of SQL statements that refer to that table. Have you ever used an index in a book to quickly look up a specific topic? Indexes in databases are based on a similar ideaif a column of a table has indexes, you can look up the index to find the rows of the table that have a particular data value for that column. Keep in mind that indexes are independent of the data in a table; they are just a fast way to get to the table data. You can create or drop an index at any time without affecting the table to which the index refers or any other indexes on the table. If you add data to the table or delete some of the existing data, Oracle8 makes sure that these changes are reflected in the index. Oracle8 lets you create more than one kind of index. You can use the following types of indexes when you build a data mart:
I
Bitmap indexes: You often use bitmap indexes when you build a data mart. Bitmap indexing benefits data warehousing applications, which have large amounts of data and ad hoc queries but a low level of concurrent transactions. For such applications, bitmap indexing provides: Reduced response time for large classes of ad hoc queries A substantial reduction of space usage compared to other indexing techniques Dramatic performance gains even on very low-end hardware Very efficient parallel DML and loads
Fully indexing a large table with a traditional B-tree index can be prohibitively expensive in terms of space because the index can be several times larger than the data in the table. Bitmap indexes are typically only a fraction of the size of the indexed data in the table. The advantages of using bitmap indexes are greatest for low cardinality columns: that is, columns in which the number of distinct values is small compared to the number of rows in the table. If the values in a column are repeated more than a hundred times, the column is a candidate for a bitmap index. Even columns with a lower number of repetitions (and thus higher cardinality), can be candidates if they tend to be involved in complex conditions in the WHERE clauses of queries.
I
B-tree indexes: A B-tree index is organized like an upside-down tree. The bottommost level of the index holds the actual data values and pointers to the
4-12
corresponding rows, much like the index in a book has a page number associated with each index entry. The rest of the blocks at the upper levels provide a road map to the right block at the bottommost level.
In general, you use B-tree indexes when you know that your typical query refers to the indexed column and retrieves a few rows. In these queries, it is faster to find the rows by looking at the index. However, there is a tricky issue herereturn to the analogy of a book index to understand it. If you plan to look at every single topic in a book, you might not want to look in the index for the topic and then look up the page. It might be faster to read through every chapter in the book. Similarly, if you are retrieving most of the rows in a table, it might not make sense to look up the index to find the table rows. Instead, you might want to read or scan the table.
Views
Lets say that your users need to see a few columns of a specific table (EMP) that contains nationwide information for all employees in your company, and they need only to see rows that refer to the employees who are managers. Instead of requiring your users to create a query repeating these criteria every time they need to access this information, you might create a view. A view is a virtual table that you can
Construct
4-13
create to capture and store often-used queries. Given the columns shown in the following figure for the EMP base table, you can create the MANAGERS view:
Your users issue the following query against the table EMP:
SELECT EMPNO, ENAME, JOB, MGR, DEPTNO FROM EMP WHERE JOB = MANAGER;
Instead, you can create a view as follows:

CREATE VIEW MANAGERS AS SELECT EMPNO, ENAME, JOB, MGR, DEPTNO FROM EMP WHERE JOB = MANAGER;
Then, instead of constantly having to enter the lengthy query, users can just query the view MANAGERS:
SELECT * FROM MANAGERS;
As you see, a view can be closer to a business representation of the data.
SQL Optimization and Execution

In a data mart application, you issue and execute mostly queries. This section describes how you can make SQL queries execute in the fastest possible time, but first it explains how the Oracle8 database server processes a SQL statement. This section introduces the basics of SQL statement processing and outlines the phases a SQL statement goes through from the time you issue the statement to the time that you see the results on your screen.
4-14
How SQL Statements Are Processed

A SQL statement is a request to Oracle8 for access to a database. A SQL statement can be a query that simply selects data from the database according to a set of criteria, or a statement that changes existing data, adds data to the database, or deletes data from the database. These SQL statements are called data manipulation language (DML) statements. You create tables and other internal database structures using another type of SQL statement called data definition language (DDL). In addition, you perform most operations to manage the database using other types of SQL statements. Statements like INSERT or DELETE simply return a message saying whether or not the operation completed successfully. However, if the statement is a query and the query is successful, you see data returned. A query can return one row or thousands of rows, but results of a query are always returned in a tabular format. The following example demonstrates how a SQL statement is processed:
1. 2. 3. 4.
User scott issues the query SELECT * FROM EMP; Oracle8 makes sure the statement uses correct syntax and is a valid SQL statement. Oracle8 checks whether user scott has the privileges to perform this operation, that is, to select all rows from the table EMP. Oracle8 determines the best way to execute this statement. This is called generating an execution plan. Sometimes, some of the previous steps are simplified if Oracle8 finds that an identical statement was issued previously and some information about this statement already exists.
5.
At this point, Oracle8 has done all of the groundwork to execute the statement. It returns the rows requested by the user.
An execution plan is merely a list of steps that Oracle8 must perform to execute a DML statement. Each step retrieves data from the database or prepares it for the user issuing the statement. The steps that retrieve data from the database are called access paths. The set of rows returned is called a row source. The steps that do not retrieve any data take these row sources as input and process them in different ways. Obviously, you want data to be retrieved and processed as quickly and efficiently as possible. The process of choosing the most efficient way to execute a SQL statement is called optimization. The component of the Oracle8 database server that is
Construct
4-15
responsible for generating the execution plan for a SQL statement is called the optimizer.
Who Does All of the Work?

The section Client/Server Architecture on page 4-4 explained client/server architecture for Oracle8. For example, when you use a tool like SQL*Plus to connect to the database, the process that is associated with the tool is the front-end (client) process. This process takes input from the user, sends the SQL statement for processing, and displays the results of the SQL statement to the user. The process that handles the query (or any other statement that you issue), retrieves the data that is requested, and returns it to the front-end process to display to the user is called the back-end (server) process. Thus, most of the steps in the processing of a SQL statement are carried out by the server process.
Different Ways of Getting to DataAccess Paths

One of the most important choices that the optimizer makes when it formulates the execution plan is how to retrieve data from the database, or what the access paths should be for a given statement. For any row in any table accessed by a SQL statement, there may be many access paths by which that row can be located and retrieved. The optimizer tries to choose the most efficient path. Lets take a look at the some of the methods by which Oracle8 can access data. For a complete list of possible access methods, refer to Oracle8 Concepts.
I
Full table scans: A full table scan retrieves rows from a table. To perform a full table scan, Oracle8 reads all rows in the table, examining each row to determine whether it satisfies the statements conditions or WHERE clauses. Oracle8 reads every data block allocated to the table sequentially. Index scans: An index scan retrieves data from an index based on the value of one or more columns of the index. To perform an index scan, Oracle8 searches the index for the indexed column values that are mentioned in the SQL statement. If the statement references only columns of the index, Oracle8 can read the indexed column values directly from the index, rather than from the table. If the statement accesses other columns in addition to the indexed columns, Oracle8 uses the address of the row, called a rowid, to find the rows in the table. Joins: A join is a query that selects data from more than one table. A join is characterized by multiple tables being listed in the FROM clause of the SQL statement. Oracle8 pairs the rows from these tables based upon the condition specified in the WHERE clause and returns the resulting rows. This condition,
4-16
called the join condition, usually compares columns of all joined tables. The following list describes some of the methods used to join two tables: Sort-merge join: Before the tables are joined, Oracle8 usually retrieves the data for each table by a full table scan on the tables. Then, Oracle8 sorts this data on the values of the columns used in the join condition. It merges the two sorted sources so that each pair of rows (one from each source) that contains matching values for the columns used in the join condition is combined and returned as the join result. Nested-loop join: First, the optimizer chooses one of the tables as the outer or driving table, and therefore designates the other table as the inner table. Next, for each row in the outer table, Oracle8 finds all rows in the inner table that satisfy the join condition. Finally, Oracle8 combines the data in each pair of rows that satisfy the join condition and returns the resulting rows. Hash join: First, Oracle8 performs a full table scan on each of the tables and splits it into as many partitions as possible based on the available memory. Then, Oracle8 builds an internal structure called a hash table from one of the partitions and uses the corresponding partition in the other table to probe the hash table. For each pair of partitions (one from each table), Oracle8 uses the smaller one to build a hash table and the larger one to probe the hash table.
The Cost-Based Optimizer

To choose an execution plan for a SQL statement, Oracle8 uses one of these approaches:
I
Rule-based: When using the rule-based approach, the optimizer looks at the different access paths available and uses a preconfigured set of rules to rank these access paths according to the speed of execution of each path. Then, it chooses the access path with the better rank. This approach does not take into account the size of the table or index, or other issues, like how the data is distributed. Cost-based: The optimizer considers available access paths and factors in information based on statistics for the tables or indexes accessed by the SQL statement to determine which path is the most efficient. These statistics are stored in the data dictionary tables and quantify the data distribution and storage characteristics of tables and indexes. Using these statistics, the optimizer calculates how much I/O, CPU time, and memory are required to execute a SQL statement for a particular access path, and assigns costs to each possible
Construct
4-17
execution plan based on such calculations. The cost-based approach also considers hints or optimization suggestions placed in the SQL statement. Oracle Corporation recommends that you use the cost-based optimization approach for most Oracle8 applications, including all data mart applications. So how do you control what optimization approach to use? You can set the init.ora parameter OPTIMIZER_MODE to CHOOSE, which causes the optimizer to choose between the rule-based and cost-based approach depending on whether or not the statistics needed for the cost-based approach are present. If statistics are available for at least one of the tables accessed by the SQL statement, the optimizer uses the cost-based approach.
Generating Statistics
The statistics that are required for the cost-based approach must be compiled for each table or index that you are likely to access in a SQL statement. These are compiled using the ANALYZE command. As part of the ANALYZE command, you can specify whether you want Oracle8 to estimate the statistics based on randomly sampling some of the data in the table or index or to compute the statistics exactly. To perform a computation, Oracle8 needs to scan the table and then sort it. If the table is small enough, Oracle8 can sort it in memory, but more often, the table is too big to be sorted entirely in memory. In that case, Oracle8 sorts part of the data in memory and allocates temporary space to act as a workspace to hold the data that is not being currently processed in memory. Thus, to compute the statistics for a table, Oracle8 could need enough disk space to scan and sort the entire table. If you do not have enough time to analyze the entire table, you can estimate the statistics using a 20% sample size. Note, however, that estimating the computation may not give you the optimal query performance.
Parallel Processing
Parallel processing divides a large task into many smaller tasks, and executes the smaller tasks concurrently. As a result, the larger task completes much more quickly.
4-18
The following figure compares processing a task sequentially and processing the same task in parallel:
Some tasks can be effectively divided and are good candidates for parallel processing. Other tasks, however, are not a good fit for this approach. Lets illustrate this with a real-life example. In a bank with only one teller, all customers must form a single line to be served. With two tellers, the task can be effectively split so that the customers form two lines and are served twice as fast. Parallel processing is a good solution. By contrast, if the bank manager needs to review all loan requests, parallel processing will not necessarily speed up the flow of loans. No matter how many tellers are available to process loans, all requests must form a single queue for bank manager approval.
Parallel Query
The parallel query feature can dramatically improve performance for data-intensive operations, such as the typical queries you run against your data mart. Systems with multiple CPUs, such as SMP systems, gain the largest performance benefit from the parallel query feature because query processing can be effectively split up among the many CPUs. An important point is that the query is parallelized dynamically at execution time. Thus, if the distribution or location of data changes, the Oracle8 database server automatically adapts to optimize the parallelization of each SQL statement. Oracle8 database server can use parallel processing for the following statements:
I
SELECT INSERT, UPDATE, and DELETE CREATE TABLE. . . AS SELECT
Construct
4-19
CREATE INDEX
Remember that the server process does all of the work when you issue a query to access your data. The server process retrieves the data from the database and sends the results to the client process for display. The parallel query feature allows the operations listed to be performed by multiple processes. When Oracle8 employs parallel query processing, one process, known as the query coordinator, divides the execution of a statement among several query servers and coordinates the results from all of the servers to send the results back to the user. Typically, the query completes much faster when Oracle8 uses parallel processing. The following figures illustrate how parallel processing can help a table scan complete much faster. The query coordinator breaks down the execution of the full table scan into parallel pieces. Query servers are assigned to each operation in the processing of a SQL statement. The number of query servers assigned to a single operation is the degree of parallelism for the query. Then, the query coordinator integrates the partial results produced by each of the query servers to assemble the results for the full table scan. This figure illustrates a full table scan without parallel query:
This figure illustrates a full table scan with parallel query:
4-20
Parallel CREATE TABLE. . . AS SELECT Statements

Decision-support applications often require that large amounts of data be summarized (rolled up) into smaller tables for use with ad hoc queries. This allows you to compute data once and reuse it many times. Often, such rollup operations must occur regularly (perhaps nightly or weekly) and should happen when the system is relatively inactive. The parallel query feature allows you to parallelize the operation of creating a table as a subquery from another table or set of tables. This is known as parallel CREATE TABLE. . . AS SELECT (PCTAS). Because the summary table is derived from data from other tables, you do not need to use redo logging, which logs the information required to recover the database or reconstruct changes to the database in the event of a system failure. The operation usually runs much faster if logging (recoverability) is turned off. If you use the unrecoverable option to disable recoverability during parallel table creation, you should back up the tablespace containing the table after the table is created to avoid loss of the table due to media failure. When creating a table in parallel, each of the query server processes uses the value in the storage clause that specifies the size of the initial extent of the table. Therefore, a table created with a degree of parallelism of 12 and an initial extent of 1 MB consumes at least 12 MB of storage during table creation because each process starts with an extent of 1 MB. When the query coordinator combines the extents, some of the extents may be trimmed and the resulting table may be smaller than the 12 MB initially allocated if no further extents were allocated.
Parallelizing SQL Statements

As How SQL Statements Are Processed on page 4-15 discussed, the optimizer determines the best execution plan for a SQL statement as part of the processing of the statement. After the execution plan is determined, the query coordinator process determines the parallel execution method; that is, which operations can be performed in parallel. When making these decisions, the query coordinator uses information specified in the table definition, the hints that are provided in a query, the initialization parameters, and, if at least one of the operations is involved in processing, a full table scan. (Hints are suggestions that you give to the optimizer for optimizing a SQL statement. Hints let you control decisions that are usually made by the optimizer.) The query coordinator process examines the operations in the execution plan to determine whether the individual operations can be made parallel. The individual operations that can be made parallel are:
Construct
4-21
Sorts Joins Table scans Index creation Table population
Setting the Degree of Parallelism

The query coordinator determines the degree of parallelism by considering three factors:
I
Query hints. Table definition. Global Computing Case StudyCreating a Summary Table on page 8-21 describes how you can set the degree of parallelism at the table level. Initialization parameters. Setting Initialization Parameters on page 8-18 looks at the relevant initialization parameters.
For queries involving more than one table, the query coordinator uses the greatest number specified for any table in the query. All of the listed factors determine only how many query servers the query coordinator requests, not necessarily how many are finally used in processing the query. For PCTAS, the degree of parallelism is determined by what is specified in the table definition. If no degree of parallelism is specified, the degree of parallelism is derived from the parallelism of the subquery. If the subquery cannot be made parallel, the table is created using just one process. If you do not specify the degree of parallelism in any way, Oracle8 determines the number of disks on which the table is stored and the number of CPUs in the system and selects the smaller of these two values as the default degree of parallelism.
Parallel Direct Path Load

Because data marts maintain rolling windows of historical data, you must purge old data and load new data as part of ongoing maintenance. You need to do this efficiently so that loads complete within the window allowed for batch jobs. The fastest way to load data into an Oracle8 database is by using the direct path load option of the SQL*Loader utility. Direct path load eliminates much of the Oracle8 database overhead by writing directly to the database files. You can speed up direct path loads even further by specifying that redo should not be logged. You
4-22
can also run multiple concurrent SQL*Loader sessions to load the table in parallel. If you run parallel direct path load, keep in mind that each of the loader processes will allocate an extent of the size specified by the storage parameter NEXT EXTENT and appropriately size the tablespace in which the table is being created.
Parallel Index Create

In a data mart application, the size of the data mart typically grows over time. This means that it can become difficult to finish administrative operations like loading data and re-creating indexes in finite batch windows unless index creation can be speeded up in proportion to the growth in data volume. Parallel creation of indexes speeds up index creation by using multiple processes simultaneously to create an index. By dividing the work necessary to create an index among multiple query server processes, Oracle8 can create the index more quickly than if a single server process created the index sequentially. If you do not explicitly specify the degree of parallelism to use in creating the index, Oracle8 uses the degree of parallelism specified when the table was created. To create the index in parallel, Oracle8 randomly samples the table and finds a set of index keys that equally divides the index into the same number of pieces as the degree of parallelism. A first set of query processes scans the table, extracts the index key and row location information, and sends this to a process in a second set of server processes based on key. Each process in the second set sorts the keys and builds an index in the usual way. After all index pieces are built, the query coordinator process concatenates all pieces to build the final index. To further increase the performance, you can specify that no redo entries be logged during index creation. That is, turn recoverability off. When creating an index in parallel, each query server process uses an amount of space equal to the initial extent value specified in the index creation statement. Therefore, you should make sure that an adequate amount of space is available in the tablespace that holds the index.
Star Query Processing

This section reviews information about star schemas and describes the issues involved in processing star queries.
Star Schema Revisited

You know by now that the star schema is the way that data is represented in many data warehousing applications. The star schema has one or more very large fact tables containing the primary information in the data warehouse, and a number of
Construct
4-23
much smaller dimension tables, each of which contains information about the entries for a particular attribute in the fact table. The dimension tables are not related to each other, but each dimension table has a primary key/foreign key relationship with the fact table. In fact, this relationship is the defining characteristic of a star schema. A typical fact table contains keys and measures. For example, a star schema in a retail environment could have a simple fact table containing the measure SALES that records the amount of total sales for each sales transaction, and the keys TIME, PRODUCT, SUPPLIER, and STORES. The SALES table would be very large because a retail chain could easily have millions of sales transactions per day.
In this example, there would be corresponding dimension tables for TIME, PRODUCT, SUPPLIER, and STORES. The PRODUCT dimension, for example, would typically contain information about each product that appears in the fact table.
What Is a Star Query?

A star query is a typical query executed on a star schema. In a star query, each of the dimension tables is joined to the fact table using a primary key to foreign key join called the star join. Here is an example of a star query that calculates the total sales of a specific product to a group of customers:
SELECT SUPPLIER.NAME, STORES.NAME, SUM(TOTAL_SALES) FROM SALES, CUSTOMER, PRODUCT, SUPPLIER, STORES
4-24
WHERE SALES.CUSTOMER_KEY = CUSTOMER.CUSTOMER_KEY SALES.PRODUCT_KEY = PRODUCT.PRODUCT_KEY SALES.SUPPLIER_KEY = CUSTOMER.SUPPLIER_KEY SALES.STORE_KEY = STORES.STORE_KEY SALES.DATE BETWEEN 01-AUG-94 AND 01-SEP-94 AND CUSTOMER.NAME IN (LINDBERGH, JOHNSON) AND PRODUCT.NAME = EXTRA-STRENGTH MYLENOL GROUP BY SUPPLIER.NAME, STORES.NAME ; AND AND AND AND
Optimizing Star Query ExecutionStar Transformation

Oracle8 Enterprise Edition provides improved performance for star queries, using the star transformation algorithm and bitmap indexes. The star transformation is a cost-based query transformation that can process star queries with large or many dimension tables, unconstrained dimension tables, and dimension tables that have a snowflake schema design. It can also efficiently process queries that contain criteria that eliminate a great number of the rows in the fact table. The star transformation does not produce costly Cartesian-product joins. The algorithm processes star queries in two phases. First, Oracle8 retrieves exactly the necessary rows from the fact table by using bitmap indexes. (Bitmap indexes offer significant storage savings over previous methods that required concatenated column B-tree indexes.) Then, Oracle8 joins this result set from the fact table to the relevant dimension tables. This allows for better optimization of more complex star queries, such as those with multiple fact tables. The algorithm also uses parallel processing, including parallel index scans on both the partitioned and nonpartitioned tables. To specify that Oracle8 use the star transformation, you set the initialization parameter STAR_TRANSFORMATION_ENABLED to TRUE and use the STAR_ TRANSFORMATION hint. For more information about using the star transformation algorithm, see the Oracle8 Concepts manual.
Partitioned Tables and Indexes

Partitioning enables better management of very large tables and indexes by dividing them into smaller parts. Partitioned tables and indexes can improve availability, ease administration, and enhance query performance in your data mart. A partitioned table is a table that is divided (partitioned) into several smaller parts,
Construct
4-25
based on a range of key values that you specify. You can specify storage attributes, including the physical placement, for each partition. Note that the partitioning option is not part of the basic Data Mart Suite, but you can purchase the option separately. Suppose that you load new data into your fact table every week. You choose to partition your fact table based on the time period, with one partition for each four-week period. When you load data into the data mart, you load data (and indexes) into only one partition, rather than the entire table. This provides much more efficient data load cycles. When you create a partitioned table, you specify:
I
The logical attributes of the table, such as column and constraint definitions, as you do for nonpartitioned tables The physical attributes of the table For a partitioned table, these attributes specify defaults for the individual partitions of the table.
The table-level algorithm used to map rows to partitions A list of partition descriptions, one for each partition in the table Each partition description includes a clause defining supplemental, partition-level information about the algorithm used to map rows to partitions. This clause can also specify a partition name and physical attributes for the partition.
This SQL statement creates the partitioned table ACCOUNT_SALES:

CREATE TABLE account_sales ( acct_no NUMBER(5), acct_name CHAR(30), amount_of_sale NUMBER(6), week_no INTEGER ) PARTITION BY RANGE ( week_no ) ... (PARTITION sales1 VALUES LESS THAN ( 4 ) TABLESPACE ts0, PARTITION sales2 VALUES LESS THAN ( 8 ) TABLESPACE ts1, ... PARTITION sales13 VALUES LESS THAN ( 52 ) TABLESPACE ts12 );
4-26
The following figure shows the ACCOUNT_SALES table, with data divided by week number into 13 four-week partitions:
For maximum availability of data, store each partition in a separate tablespace and each tablespace on one or more separate storage devices. This separation of partitions boosts availability of the data, because the loss of a disk drive, for example, affects only one partition or part of one partition, not the entire table. Partitions enable data management operations like data loads, index creation, and data purges at the partition level, rather than on the entire table, resulting in significantly reduced times for these operations. Partitioning can significantly reduce the impact of scheduled downtime for maintenance operations:
I
By introducing partition maintenance operations that operate on an individual partition rather than on an entire table or index By providing partition independence so that maintenance operations can be performed concurrently on different partitions
In a decision-support system (DSS), queries on very large tables present special performance problems. An ad hoc query that requires a table scan may take a long time, because it must inspect every row in the table. There is no way to identify and skip subsets of irrelevant rows. The problem is particularly important for historical tables, for which many queries concentrate access on rows that were generated recently. Partitions help solve this performance problem. An ad hoc query that only requires rows that correspond to a single partition (or range of partitions) can be executed using a partition scan rather than a table scan. The optimizer is aware of the
Construct
4-27
Global Computing Case StudyManaging Data Mart Storage
partitions and the range of values they represent and, as a result, can eliminate from the search any partitions that do not contain the data required by the query. For example, a query that requests data generated in the month of October 1997 can scan just the rows stored in the October 1997 partition, rather than rows generated over many years of activity. This improves response time, and it may also reduce substantially the temporary disk space requirement for queries that require sorts. If your fact table is large, it is a good idea to partition the tables and indexes. For more information about table and index partitioning, see the Oracle8 Concepts manual.

In this part of the Global Computing case study, you learn how to manage storage and schema objects. You use the database named DMDB, which contains one file each for the tablespaces SYSTEM, USERDATA, ROLLBACK, and TEMP. In the exercises in this section, you learn how to accomplish some common database management tasks. In the previous chapter, you designed the database as the user yves. In this chapter, you will do the following tasks:
I
Add space to the database by creating a new tablespace and adding datafiles to an existing tablespace. Create tables for the user yves, using the DDL generated in Generating the DDL Script on page 3-52. Start up and shut down a database.
Logging In to Oracle Enterprise Manager Components

You perform most of these database management tasks using Oracle Enterprise Manager. These exercises assume that you chose the standard installation and therefore automatically have an Enterprise Manager repository in the DMDB instance. Oracle Enterprise Manager includes the following components:
I
Oracle Backup Manager: Lets you back up tablespaces, administer redo logs, and create backup scripts guided by a backup wizard.
4-28
Oracle Data Manager: Lets you manage and move data to and from an Oracle database. This application is a front end for Oracle database utilities, such as Export and Import. Oracle Instance Manager: Lets you start up and shut down your database and edit initialization parameters. Oracle Schema Manager: Lets you create schema objects like tables, indexes, views, synonyms, and triggers. Oracle Security Manager: Lets you create, alter, and drop users, roles, and profiles. Oracle SQL Worksheet: Lets you enter SQL statements, PL/SQL code, and database administrator commands. It maintains a history of commands that you have used, allowing you to reuse them. Oracle Storage Manager: Lets you perform database administrator tasks associated with managing database storage elements, such as tablespaces and rollback segments, and adding and renaming datafiles. Oracle Software Manager: Lets you distribute software throughout a client/server network.
Except for the Instance Manager application, take the following steps to log in to any of these applications:
1. 2. 3. 4.
From the Windows NT Program menu, select Oracle Enterprise Manager. From the Oracle Enterprise Manager menu, select the application you want to run. In the dialog box that appears, specify that you wish to continue without connecting to the repository. (Note that you might not see this box.) Log in as user system. In the Login Information dialog box, enter the following information:
I
Username: system Password: manager Service: dmdb
Construct
4-29
Connect As: Normal
5.
Click OK.
Then, you see the main window for the application. The display on the right side of the window is determined by the objects selected from the tree list on the left side of the window. The usage conventions for most of the Oracle Enterprise Manager applications are as follows:
I
Click the plus sign (+) to the left of a folder icon to expand and display the contents of a folder. Double-click a collapsed folder icon to expand the folder. Click the minus sign (-) to the left of the folder icon to collapse a folder. Click a folder icon to display a multicolumn scrollable list of the objects of that folder in the right side of the window. Select the Collapse and Expand menu options in the View menu to collapse and expand folders. Click an individual object to display the property sheet for the object.
Adding Space to the Database

As Tablespaces, Datafiles, and Data Blocks on page 4-8 discussed, you can add space to the database by creating new tablespaces, adding space to existing tablespaces, or extending the datafiles associated with an existing tablespace. This section describes how you use the Storage Manager to add space to the database.
4-30
First, invoke the Storage Manager and connect to it as indicated in Logging In to Oracle Enterprise Manager Components on page 4-28. When you successfully connect, the following window appears:
The Tablespaces, Datafiles, and Rollback Segments folders for the database DMDB display in a tree list on the left side of the window. To take advantage of all features, from the View menu, select Advanced Mode.
Creating a New Tablespace

Typically, you create a new tablespace rather than adding space to existing tablespaces if you want to allocate a designated area for a set of objects. In this exercise, you create a new tablespace called YVES_DATA and associate it with a datafile. You use a file name with the following format:
<Oracle_home>\database\<filename>
To list all tablespaces, expand the Tablespaces object folder in the Storage Manager window. All tablespaces in the database are displayed in alphabetical order. To view
Construct
4-31
more information about the tablespaces, double-click the Tablespaces folder. Similarly, you can view information about datafiles and rollback segments.
To create a new tablespace, take these steps:

1. 2. 3. 4.
Select the Tablespaces folder and click the plus sign (+) on the toolbar. Enter the name of the tablespace: YVES_DATA. For Status options, select Online to indicate that you want the tablespace to be available for access after it is created. Specify the type of tablespace by selecting Permanent. (Temporary is used only for Sort segments.)
5.
To create the initial datafile to associate with the tablespace, click Add.
4-32
6.
In the Create Datafile property sheet, for Name, enter the full path name of the datafile. The full path name that you enter is similar to:
g:\orant\database\yves_dmdb1.ora
7.
For File Size, enter 25 and click M to specify size.
Lets make a quick digression to explain the other option shown, the Reuse Existing Datafile option. When you drop a tablespace, the Oracle8 database server does not automatically delete the corresponding datafiles. So if you decide to re-create the tablespace using the existing files, check the Reuse Existing Datafile box to reuse a file that already exists on disk.
8.
To specify that you want the tablespace to be extensible on demand, beyond the initial size specified, select the Auto Extend tab. Check Enable Auto Extend and provide the Increment: 10M. In the Maximum Extent section, which specifies the maximum disk space allowed for automatic extension of the file, specify Unlimited.
9.
10. Click OK to exit this property sheet. 11. From the Create Tablespace dialog box, select the Extents tab and select the
Override Default Values check box to override default values. The defaults are more appropriate for a transaction processing system. The values specified in the next step have been chosen with a data mart application in mind.
Construct
4-33
12. Enter the values shown in the following figure:
To review the SQL statement created by the Storage Manager, click the Show SQL button. The SQL statement is displayed at the bottom of the Create Tablespace window, as shown in the previous figure.
13. Click Create to create the new tablespace with assigned parameters.
The newly created tablespace appears in the right pane of the window, as shown in following figure:
4-34
Adding Space to the USERDATA Tablespace

Because Chapter 5 uses the schema marty for the exercises and because that schemas tablespace, USERDATA, is too small, you need to increase the size of the USERDATA tablespace. In this exercise, you add a datafile to the tablespace:
1. 2. 3.
In the Storage Manager tree list, expand Tablespaces and select USERDATA. In the property sheet that appears in the right pane, click Add. In the Create Datafile property sheet, for Name, enter the full path name of the datafile. The full path name that you enter is similar to:
g:\orant\database\usr2dmdb.ora
4. 5. 6. 7.
For File Size, enter 25 and click M. Select the Auto Extend tab. Select Enable Auto Extend and provide the Increment: 10M. In the Maximum Extent section, which specifies the maximum disk space allowed for automatic extension of the file, specify Unlimited. Click OK to exit this property sheet, then click Apply.
Exit the Storage Manager.
Miscellaneous Tasks
Two tasks remain that relate directly to the SALES fact table and for which you use Oracle Enterprise Manager. Both tasks require that the database is populated before you perform them:
I
Analyzing the table SALES in the YVES schema. Analyzing the Table SALES on page 5-53 describes how to analyze the table. Creating a summary table, ANNUAL_SALES, in parallel, and with recoverability turned off. This table is created from the data in the fact table SALES. Global Computing Case StudyCreating a Summary Table on page 8-21 describes how to create the table.
Starting Up and Shutting Down the Database

Usually, the database instance starts up when the server starts. However, you may need to know how to manually start up and shut down the database instance. You can use the Instance Manager to perform these operations.
Construct
4-35
From the Oracle Enterprise Manager menu, invoke the Instance Manager. Provide the following login information:
I
Login: internal Password: manager Service: dmdb Connect As: Normal
If an options box appears, select the radio button for Continue without connecting to Repository. Note that startup and shutdown are the primary operations you will perform connected as user internal. All other operations are performed as the user system. When you successfully log in to Instance Manager, the following window appears:
Shutting Down the Database

Before you shut down the database, you should create a stored configuration, particularly if you are connecting to a remote database.
4-36
A stored configuration lets you create multiple configurations without the need to track initialization parameters. The Instance Manager stores the configuration in the Windows NT Registry on the local machine. Subsequently, when you start up the database and select the stored configuration, the Instance Manager reads the configuration information from the registry. If you do not create a stored configuration for a remote database, you will not be able to start up the database remotely unless you copy the database initialization file from the remote system to the local system. To create a stored configuration, take the following steps:
1. 2. 3. 4.
From the navigation tree, select Initialization Parameters. You can edit the initialization parameters in the property sheet that appears. For this exercise, do not change any parameters. Click Save. In the Save Configuration dialog box, enter a name for the configuration. Enter DMDB_CONFIG. Click OK.
Now you can shut down the database using the following steps:
1. 2. 3.
In the navigation tree, select dmdb. Select Shutdown in the Status tab. Click Apply. The Shutdown Options dialog box appears:
You can shut down the database using one of four options:
I
Normal: After all user sessions connected to the database complete processing, Oracle8 ensures that all changed data is written to disk and shuts down the instance. Immediate: Oracle8 terminates user sessions and ensures that all changed data is written to disk before shutting down the instance.
Construct
4-37
Abort: Oracle8 terminates all user sessions and shuts down the instance without any additional processing to write changed data to disk. Instead, Oracle8 will postpone making changes permanent until the database is next opened up. Transactional: After a specified length of time in which transactions can be completed, Oracle8 terminates all user sessions and shuts down the database.
If no other user sessions are connected, select Normal.

4.
Click OK.
Starting Up the Database

The Status tab of Instance Manager shows the status of the database, including whether or not it is started. Before you start up a database, you should verify that it is not currently started. If the database is not started, take the following steps:
1. 2.
From the Status tab, select Database Open. Click Apply. In the Startup Database dialog box, select Stored OEM Configuration. The list box displays the name of the configuration that you created in the previous section. Select DMDB_CONFIG and click OK.
3. 4.
Click OK. Exit the Instance Manager after the startup is completed.
4-38
5
In the last chapter, you learned how to design and create the target star schema. Now, you focus on how to extract data from different sources, change or transform it according to the needs of your target star schema, move the data automatically into your target, and handle errors that arise during this process. In the Global Computing case study, the source is an OLTP schema that is stored in an Oracle8 relational database. Although the source for a data mart is typically an OLTP database, it could also be a data warehouse or a nonrelational source, such as VSAM and ISAM flat files from database systems that predate relational systems. The steps in moving the data from source to target constitute the Extraction, Transformation, Transportation (ETT) process. The ETT process usually requires that you do the following:
I
Define the set of tables that you want from the source, and set up structures describing these tables. Provide source-to-target mappings, which explain how you want the ETT tool to derive the target data from the sources you have just defined. Describe how you want the data transformed (filtered) and decoded to create the information stored in the data mart. Determine the process by which you will load the data into the data mart. Determine how you will refresh the data in the data mart to reflect changes in the source data.
5-1
What Is the Role of Oracle Data Mart Builder?
This chapter describes how to perform each of these functions.
What Is the Role of Oracle Data Mart Builder?

Oracle Data Mart Builder consists of a set of graphical tools for managing the flow of information from data source to the target data mart. These tools use a visual metaphor called the data flow to describe the process of capturing data from a data warehouse or operational database and transforming and loading the data into the data mart. This cookbook uses the following components of Oracle Data Mart Builder:
I
Oracle Data Mart Builder Oracle Data Mart Builder Admin
In addition, Data Mart Builder provides utilities to build, upgrade, and maintain a repository.

Oracle Data Mart Builder allows you to create a graphical representation, called a BaseView, of the physical data source. Similarly, Builder lets you create BaseViews to represent the target star schema. The following figure shows the source BaseView GCC_DM:
5-2
Builder also allows you to create MetaViews, which are business representations of the sources based on one or more BaseViews. MetaView components can include renamed columns and calculated columns, such as custom formulas and aggregates grouped by business category, as well as tables from multiple BaseViews. This hides the complexity of the underlying database architecture by referring to the source in meaningful business terms. Data sources include flat files and relational databases such as Oracle8 or SQL Server. A typical data target, called a sink, can be a grid, which lets you preview your results, or a loader, which represents the mechanism by which data is loaded into the target table. After you define the data sources and data targets, you can model the process of data flow between source and target by creating a data flow plan through the drag-and-drop interface of the Data Flow Editor. In creating the data flow plan, you typically use predefined transforms, which represent the steps in extracting, transforming, and loading data into a star schema. The Data Flow Editor also supports languages like Visual Basic to enable you to develop custom transforms. Oracle Data Mart Data Collection Agent, a program that brokers all interactions between the source database servers and Builder components, runs the plans.
Parts of a Data Flow Plan

The data flow plan represents the logical unit of work in the process of moving data from source to target. A plan always begins with a source and ends with a sink. The following figure shows a basic data flow plan that uses a SQL Query Transform to extract source data into a grid, which displays the data on the desktop in a spreadsheet format. A connector designates the path between the steps in a plan.
You can add steps to this plan using transforms, where each transform provides a specific type of processing. To add a transform, you drag the transform into an existing data flow and drop it on the connector between the desired input and
5-3
output steps. Builder automatically redraws the connectors as appropriate. You can use built-in transforms or you can write your own transforms, as described in Transform Software Developers Kit on page 5-12. Builder stores all components of a plan as separate objects. As a result, you can save each transform separately and reuse it.
Built-in Transforms
Transforms are the components of a data flow plan. They are program modules for performing different types of operations, such as controlling the flow of data in a plan, analyzing data, populating a data mart, and customizing data manipulation at the record level. Built-in transforms, as well as the custom transforms that you create and save, are stored in the Tool Bin. The following sections describe the built-in transforms.
Source Transforms
You use the source transforms to extract data from the source database or files. Builder provides the following source transforms:
I
SQL Query Transform The SQL Query Transform is a source in a data flow plan. When you choose the tables and columns to be used as data sources by dragging and dropping these into the Workspace, Builder automatically creates a data flow that consists of a SQL Query source and a Grid sink, as shown in the following figure:
The SQL Query Transform represents the SELECT statements required to extract the data from the source tables. You can view these SQL statements by using the Query Editor. The Query Editor allows you to remove individual columns (parts) from the SQL query or to add a filter to the SQL query. The filter is added to the SQL query in the form of a WHERE clause. In addition, you can edit the SQL statements directly by using the SQL Editor. Note, however, that once you use the SQL Editor to modify the SQL query, you cannot use the Query Editor.
5-4
You can save any SQL Query Transform in the Tool Bin as a custom transform with any name or label you choose. Users with access to the Tool Bin can reuse these transforms.
I
Delimited Text File Source Transform The Delimited Text File Source Transform imports character-separated values from delimited text files. You provide a path and file name and you specify whether or not the transform should read column names from the first row of the file. You can also specify the separator, quote, and comment characters, as well as the input file character set.
Fixed Length File Source Transform The Fixed Length File Source Transform accepts input from flat files containing fixed-length fields and translates the data into formats supported by Oracle databases. Fixed-length means that the fields (columns) are a specified length and the file does not contain field separators. This allows you to load data stored in a flat file into a database in a data mart.
Data Flow Transforms

The data flow transforms control the flow of data in a data flow plan. Builder provides these transforms:
I
Join Transform The Join Transform produces a single result set by joining the columns from two data flows. (A result set contains the data retrieved.) Where the Union Transform combines two sets of rows one on top of the other, the Join Transform pairs the columns from one result set with the columns from the other result set. The joined columns form a new record structure. The Join Transform supports multiple key joins. It requires that the two result sets contain related key columns.
Union Transform The Union Transform produces a single result set from the rows in two data flows, including duplicate rows. The two sets of rows from the data flows are combined. The rows in each data flow must contain an equal number of columns, and each column must have identical attributes.
5-5
Splitter Transform Use the Splitter Transform to create two identical copies of a result set in a data flow. A Splitter Transform has one input connection and two output connections.
Conditional Splitter Transform The Conditional Splitter Transform lets you divide the output of the data flow into two sets using filter conditions. This transform has one input and two outputs. The results that pass through the top output connection match the specified condition.
Record Manipulation Transforms

The record manipulation transforms act as intermediate steps in a data flow plan. Builder provides these transforms:
I
Breakpoint Transform The Breakpoint Transform helps you find and isolate problems in a plan. You can use this transform to examine data on a record-by-record basis.
Concatenation Transform The Concatenation Transform lets you combine columns and typed text into one column.
Filter Transform The Filter Transform removes unwanted rows from the data flow. This filter works independently from the relational database. You can use multiple Filter Transforms in the data flow to display the plan results you want. Filter Transforms are linked with the AND operator.
General Filter Transform The General Filter Transform removes unwanted rows from the data flow. This transform can be used to specify multiple conditions, but it does not support the use of regular expressions in conditional statements. Use the Filter Transform to include regular expressions in conditional statements.
5-Way Router Transform The 5-Way Router Transform lets you specify up to five filters (conditions) at a time. It has one input stream and multiple output streams. The filters can be mutually exclusive, or rows can match more than one of the filter conditions.
Substring Transform
5-6
The Substring Transform extracts a substring from a string value and places that substring in a new or existing column. You define the substring by specifying a start position and a length.
I
Search and Replace Transform The Search and Replace Transform searches a column for specified strings or substrings and replaces all occurrences of that string with different strings in a new or existing column.
SQL Command Transform The SQL Command Transform lets you execute customized SQL commands in a plan. It also lets you execute database tasks before or after the SQL commands you specify.
Column Manipulation Transforms

The column manipulation transforms act as intermediate steps in a data flow plan. Builder provides these transforms:
I
Add Columns Transform The Add Columns Transform adds string, date, or numeric columns to the data flow.
Aggregation Transform The Aggregation Transform provides a way to group information according to your needs and to generate statistics about that information. This transform provides the following functions, which you can apply to the input data: SUM, COUNT, AVG, MAX, MIN, STDDEV, and VARIANCE.
Column Select Transform The Column Select Transform removes specified input columns in a data flow and outputs the remaining columns.
Expression Calculator Transform The Expression Calculator Transform lets you choose from over 70 functions and operators to apply to columns in the data flow. The input columns you specify are passed through the expression, and the resulting values are displayed in a new column.
Rename Columns Transform
5-7
The Rename Columns Transform lets you rename a column in a plan without affecting the part name or database column name. This transform is useful if you want to use name-based mapping to load data into a database in a data mart, and the part in the source data MetaView has a different name from the column in the target database. For example, to pass the new name to the SQL*Loader, add this transform to the data flow just before the SQL*Loader Transform. You can also use this transform to rename a calculated field.
I
Record Number Transform The Record Number Transform adds a column containing the record number to the data flow.
Sort Transforms
The sort transforms let you sort records in the data flow. Builder provides the following transforms:
I
Disk Sort Transform The Disk Sort Transform sorts large numbers of records in the data flow. It uses a small amount of memory and a set of temporary disk files. This transform is especially useful for population plans that process large amounts of data.
Memory Sort Transform The Memory Sort Transform sorts the records in a data flow by the specified columns, in ascending or descending order. It lets you analyze data in a result set.
Sink Transforms
Oracle Data Mart Builder provides transforms to use as data sinks:
I
Grid Transform The Grid Transform is a sink that displays the results from a data flow in a spreadsheet form. The Grid Transform is the default sink for any plan you create. Builder automatically adds it to the data flow when you create a plan by dragging tables and columns into the Workspace. A plan can contain multiple data flows, each with its own Grid sink. A single data flow can also have more than one Grid sink.
Command Line Sink Transform
5-8
The Command Line Sink Transform lets you pass data in a data flow to a command-line utility. This is useful if the utility that loads data into your data mart database runs from the command line. The Command Line Sink Transform includes Registry settings so that you can adjust the number of records to send to the load utility and change the integer size on the target machine.
I
Delimited Text File Sink Transform The Delimited Text File Sink Transform exports plan results to a text file as character-separated values. You provide a path and file name, and specify whether to include row headers.
Save to Table Transform The Save to Table Transform is a sink that saves the result set to a table in a specified database. You can create a new table or append the records to an existing table. You specify the database by selecting a BaseView.
SQL Command Sink Transform The SQL Command Sink Transform is similar to the SQL Command Transform except that you can use it as the data sink. This transform lets you execute customized SQL commands in a plan. It also lets you execute database tasks before or after the SQL commands you specify.
Terminal Sink Transform The Terminal Sink Transform lets you end a data flow plan without displaying results or taking further action.
Data Mart Population Transforms

Oracle Data Mart Builder provides transforms for generating keys or data and populating the star schema found in a typical data mart:
I
Key Generation Transform The Key Generation Transform generates a warehouse key for a dimension table. You also use this transform to refresh or update the data in a dimension table.
Key Lookup Transform The Key Lookup Transform populates one or more foreign key columns in a fact table using one or more natural keys from a dimension table. It lets you edit
5-9
the SQL statement to customize a key lookup operation. It also lets you specify the number of database records to store in cache during the lookup operation.
I
Time Generation Transform The Time Generation Transform populates a time dimension table. A time dimension table makes it possible to analyze time data without using complex SQL calculations. The time dimension table is different from other dimensions because it is populated with only generated data. You do not load data from a source table into a time dimension table.
Time Lookup Transform The Time Lookup Transform reads an input datetime value and outputs that datetime value and the specified equivalent Julian date values. The results can be output to a table or to a Grid. You can use either the Key Lookup Transform or the Time Lookup Transform to populate a fact table with Julian date values.
Batch Loader Transform The Batch Loader Transform populates tables in an Oracle Data Mart database. The dialog box for the Batch Loader Transform allows you to specify the table name and batch size. You can also choose to load data into an existing table, create a new target table, or drop the existing table and create it again.
Batch Update Transform The Batch Update Transform lets you update rows of data in your data mart when there are changes to the source database. You can update the rows one table at a time.
SQL*Loader Transform The SQL*Loader Transform, which uses the SQL*Loader utility, provides the fastest method of populating a table, especially when you specify the use of
5-10
direct path loading and turn off redo logging. See the Oracle8 Utilities manual for information about the SQL*Loader utility.
The SQL*Loader Transform lets you specify the type of load operation, specify control files, and execute SQL before or after the load operation. You can also choose to load data into an existing table, create a new target table, or drop the existing table and create it again.
VBScript Transforms
You can use VBScript transforms to create custom sources, steps and sinks, or data flows. They are a subset of Visual Basic code that is executed on the server.
I
VBScriptCopy Transform The VBScriptCopy Transform copies the input record and lets you manipulate its data and its column attributes. Use this transform if you want to add or remove columns, or change any of the column attributes, such as size, type, and name.
VBScriptInplace Transform The VBScriptInplace Transform processes each record in place, which means that only the data of the input record can be altered. You cannot add or remove columns, or change any of the column attributes. For example, a column in the
Extraction, Transformation, and Transportation 5-11
Oracle Data Mart Builder Admin
input record with a 10-character string cannot be output with more than 10 characters, although it can be output with fewer.
I
VBScriptSink Transform The VBScriptSink Transform outputs records to an external application, such as Microsoft Excel or Word. The target application must be OLE-compliant and installed on your computer and on the system where the Data Collection Agent is running.
VBScriptSource Transform The VBScriptSource Transform reads input data from an external application. The source application must be OLE-compliant and installed on your computer and on the system where the Data Collection Agent is running.
Transform Software Developers Kit

Oracle Data Mart Suite provides the Oracle Data Mart Builder Transform Software Developers Kit (SDK). The Transform SDK lets you create your own transforms to perform custom processes on data. For example, you can develop transforms to create legacy database data sources for data flow plans, to store the output of plans in a database, or to handle specialized data transformations that are unique to your organization. For more information, see Oracle Data Mart Builder Programmers Guide.
Oracle Data Mart Builder Admin

Oracle Data Mart Builder Admin helps you administer the Builder environment. It lets you configure the repository, which is a database storing all Builder-specific metadata, and manage the Data Collection Agent. In the sample database that ships with the suite, a default repository named <default> is created for you. As the data in the source database changes in the course of normal business processing, you will want these changes reflected in the data mart. The Admin tool lets you schedule refreshes of the information in the data mart using the Windows NT scheduling service. See Updating Data in the Data Mart on page 8-28 for information on that topic.
5-12
Understanding Data Extraction and Oracle Data Mart Builder

Oracle Data Mart Builder can extract data from relational structures in most industry-standard relational databases or from flat files that are in a character-separated format or fixed-length format. For nonstandard data sources, you may need to write code or use third-party applications to read data and parse the data into a character-separated format or a relational database. The data extraction process requires you to describe the source and target schemas in terms that Oracle Data Mart Builder expects, then build a data flow plan to model the process of moving data from source to target.
Primary Data Sources and Targets

Data sources can include flat files, operational databases, data warehouses, and other data marts. If the data source is a relational database, you typically use the SQL Query Transform to build a query to extract the data. If your source data resides in a database to which Builder cannot connect (or cannot connect in real time), you can use any of the following methods:
I
Extract fixed-length data into a flat file and use the Fixed Length File Source Transform to load data into a data mart table. Extract your data into a character-separated flat file and use the Delimited Text File Source Transform to load data into a data mart table. Use a third-party extraction tool to read the data into a database type supported by Builder.
Data targets are data mart tables or the users desktop.
Source MetadataProviding Information About the Source

Metadata is data about data, expressed in terms expected by the particular set of tools used to access the data. The source metadata, the structure of source data, needs to be described and stored in a format understood by Oracle Data Mart Builder before Builder can extract the data. These descriptions of the physical source dataincluding tables, columns, and referential integrity relationshipsconstitute the source BaseViews. Builder can also create a business-oriented representation of source data by creating source MetaViews. MetaViews are defined in terms of underlying BaseViews, and one MetaView can consist of multiple BaseViews.
Capturing Changes in Source Data

The information in a data mart is, by nature, historical. Typically, the granularity of production data in a fact table is daily, weekly, or monthly. While a fact table is appended to regularly, dimension tables change slowly over time and are only updated with new or changed records. If you reload the data in a dimension table from the beginning every time you perform an extraction, you could lose all historical information. For example, if a customer is deleted from the source customer table, and you overwrite the data mart customer table with each extraction, you lose all facts for that customer. Ideally, you should extract only the source data that has changed since the last extraction. An important step in extracting data from an operational database is the process of change identification and capture. This process improves the extraction performance and preserves historical data. You can use a third-party tool to capture and extract changed data at the source, or you can add custom transforms to the data flow plans you use to populate your data mart.
Maintaining Referential Integrity

By default, referential integrity relationships (joins in Builder terminology) in the source schema are reflected in the BaseView. When you extract data from the source tables, Builder uses any referential integrity relationships (foreign keys) in constructing the WHERE clause of the SQL query. This ensures that the data is joined correctly. The following example, taken from the GCC_DM BaseView, which you will create later in this chapter, shows three of the tables in the BaseView and how they are related:
5-14
Understanding Data Transformation
When you use Builder to create a data flow plan, Builder constructs a SQL query that maintains the relationships. Lets say you create a data flow plan by selecting the following columns:
I
ACCOUNT_ID and AC_NAME from ACCOUNT SEGMENT_ID and SEGMENT_DESC from SEGMENTS SHIPTO_NUM and SH_DESCRIPTION from SHIPTO
Builder constructs a SQL query, shown in the following example, that maintains the referential integrity relationships by adding the predicate in the WHERE clause:
SELECT "SAMPLEOLTP"."ACCOUNT"."ACCOUNT_ID", "SAMPLEOLTP"."ACCOUNT"."AC_NAME", "SAMPLEOLTP"."SEGMENTS"."SEGMENT_ID", "SAMPLEOLTP"."SEGMENTS"."SEGMENT_DESC", "SAMPLEOLTP"."SHIPTO"."SHIPTO_NUM", "SAMPLEOLTP"."SHIPTO"."SH_DESCRIPTION" FROM "SAMPLEOLTP"."ACCOUNT", "SAMPLEOLTP"."SEGMENTS", "SAMPLEOLTP"."SHIPTO" WHERE "SAMPLEOLTP"."ACCOUNT"."ACCOUNT_ID" = "SAMPLEOLTP"."SHIPTO"."ACCOUNT_ID" AND "SAMPLEOLTP"."SEGMENTS"."SEGMENT_ID" = "SAMPLEOLTP"."ACCOUNT"."SEGMENT_ID"
Builder maintains the referential integrity relationships even if you do not include the foreign key columns in the data flow. If you want to define additional relationships, such as between tables from different schemas, you can use the BaseView Editor to add joins.
Understanding Data Transformation

Data transformation is the intermediate step in a data flow plan. Using Oracle Data Mart Builder, you can filter the data from the source and transform it before you load it into the target schema. You might need to perform calculations or value substitutions on your source data before loading it into a data mart. For such data filtering, you should use a combination of the built-in transforms provided by Oracle Data Mart Builder, because they are easy to understand and use and do not require any programming. You can modify SQL queries easily by using the SQL Filter tab of the Query Editor
Understanding Data Transportation
or the SQL Editor to add a WHERE predicate to a plan in the SQL Query Transform. In special cases where the built-in transforms cannot meet your objectives, you can develop a custom transform in a language like Visual Basic. You can use built-in transforms to generate warehouse keys or join columns from two data flows. For a list of the built-in transforms, see Built-in Transforms on page 5-4.
Understanding Data Transportation

To transport your data to the data mart database, you must create the target tables and populate them. You need to define the structure of the target tables before they can be populated. You can create tables in the data mart database before you load the data, or, if you use the Batch Loader Transform or SQL*Loader, you can have the population plans create the target tables for you. In either case, you must describe the structure of target data by creating a target BaseView and a target MetaView. When you create the target tables for the data mart database, you create the columns with the same structure as the columns in your source tables. You can use different names for the columns in the target tables, but the column types of the source and target tables must match. You can use partitioned or nonpartitioned target tables if you have the Oracle8 database server partitioning option installed. Builder loads the data into tables, taking advantage of the partitioning if it exists. In most cases, you do not need to specify the partitioning when you use Builder, unless you want to load only one partition at a time.
What Are the Steps in Populating a Data Mart?

Now, you are ready to populate a data mart. Here are the tasks you must perform:
I
Create BaseViews and MetaViews of the source data. Create a BaseView and a MetaView of your source data, so that you can create and run plans that extract data. A BaseView can include several tables. Categories represent the tables in the source BaseView and parts represent the columns.
Create a BaseView for the target data.
5-16
What Are the Steps in Populating a Data Mart?
Create a BaseView of the target schema so that you can load extracted data into the data mart database. The BaseView for target data should contain only the tables that will be loaded.
I
Create a plan for the time dimension table. Use the Time Generation, Key Generation, and SQL*Loader Transforms to populate the time dimension. Records for the time dimension table are not derived from source data but are generated independently. When you need a new range of historical dates for the data in the fact table, you must append the new dates to the time dimension table.
Create plans for other dimension tables. A dimension table has a single primary key and contains detail information for columns in the fact table. Create a data flow plan for each dimension table with the SQL Query Transform, the Key Generation Transform if needed, and the SQL*Loader or Batch Loader Transform. You need to update a dimension table only when the source data for the dimension table has new or updated records.
Create a plan for the fact table. You populate a fact table with business transactions from tables in the relational database. Create a data flow plan using combinations of a SQL Query Transform, column manipulation transforms such as the Column Select Transform, and either the SQL*Loader Transform or the Batch Loader Transform. Typically, you update a fact table on a regular basis, usually during off-hours so that you do not affect server performance. In the fact table, you can replace natural keys from the source database with warehouse keys (synthetic keys) from the dimension tables when appropriate. You load quantitative or factual data from nonkey columns in the source database directly into the fact table. You must create all dimension tables before you create the fact table.
Execute the plans. You can view the data before loading the target database using a Grid Transform. A Grid displays a result set in a columnar display that you can format. This gives you the flexibility to design iteratively until you are satisfied with the transformed source data. Then, you can replace the Grid Transform with the SQL*Loader Transform to load the target.
Handle errors. You may encounter the following errors in loading your data mart:
Global Computing Case StudyPopulating the Data Mart
Oracle errors Oracle Data Mart Builder displays errors specific to Oracle and automatically takes care of rollback and recovery of any pending database transaction. However, you should not confuse the logical unit of work in data mart context (that is, a plan) with the logical unit of work in database context (that is, a transaction). The plan may consist of loads to and from many tables in different source and target databases spanning several database transactions. You need to define a custom transform to take care of rollback and recovery of plans.
Data Mart Builder errors Data Mart Builder gives you some common error messages related to Repository usage, Data Collection Agents, and client/server connectivity. You can get the details of the error, including the cause of the error. The error detail might indicate the primary cause for errors that have multiple causes.

When you installed the Data Mart Suite, you installed the sample source and target database you use in the case study. The sample source database contains the source schema owned by the user sampleoltp. You have already designed the target schema in Chapter 3 and are now ready to populate the target schema objects from this source schema.
Starting Assumptions
This chapter assumes that you have successfully installed Oracle Data Mart Builder and the Data Collection Agent is running. Refer to the online Oracle Data Mart Builder documentation for details on how to verify that your machine is registered and the Data Collection Agent is running. The user dmadmin owns the default Builder repository. During the Data Mart Builder installation, the user dmadmin is mapped to the Data Mart Builder registered user called system. Typically, you perform the data mart population tasks logged in as a user created for this purpose using Oracle Data Mart Builder Admin. For this exercise, you log in to Builder as the registered user system with password manager rather than creating additional users.
5-18
What Happens in This Section?

In this section, you use Oracle Data Mart Builder to do the following:
I
Populate the dimension tables: DAYS, CHANNELS, CUSTOMERS, and PRODUCTS. Because the dimension tables are not very large, use the conventional path method of the SQL*Loader Transform. Populate the fact table, SALES. This is a two-step process:
1.
To minimize the impact on the production database, you download the data from the production database without applying any transforms and load it into a staging table. A staging table is a table that serves as a temporary holding area. It is stored on the data mart server but it is not part of the star schema. You move the data from the staging table to the fact table. If you need to transform the data, do it during this step.
2.
Because the staging table and fact tables are large, use the direct path load method of the SQL*Loader Transform. The SQL*Loader Transform uses the SQL*Loader utility of the Oracle8 database server. See the Oracle8 Utilities manual for information about the SQL*Loader utility. Figure 51 shows the structure of the normalized source tables, including the columns, keys, and joins between tables.
Figure 51 Global Computing CompanyOLTP Source Schema
Figure 52 shows the data for the PRODUCT and CHANNEL tables that is currently stored in flat files.
Figure 52 Global Computing CompanyFlat File Sources
Figure 53 shows the structure of the target star schema for the Global Computing Company.
5-20
Figure 53 Global Computing CompanyTarget Star Schema
Logging In to Data Mart Builder

To begin, you invoke Oracle Data Mart Builder from the Oracle for Windows NT menu and log in with the user ID system and password manager.
Representing Source Data
Then, you see the main window of Oracle Data Mart Builder, as shown in the following figure:

The first step in populating the data mart is to create BaseViews and MetaViews that represent the source database schemas in the format expected by Oracle Data Mart Builder. For the case study, you store these representations in the same DMDB database that holds the target star schema. The BaseViews, MetaViews, and plans are stored in a repository.
Creating a BaseView and a MetaView of Source Data

In general, you need to create a BaseView and a MetaView of each database schema that contains your source data. In this exercise, you use the schema that belongs to the user sampleoltp. To create the source BaseView:
1.
From the Tools menu, select BaseView Editor or click the BaseView Editor icon.
5-22
Builder displays the Define BaseView dialog box, as shown in following figure. Note that for all dialog boxes, mandatory fields are indicated by red arrows.
2.
Specify the following to create the BaseView of the source data:

I
BaseView Name: A unique name for the BaseView. Enter GCC_DM. Database Type: Select ORACLE. Data Dictionary Type: You can select either DATABASE or Oracle Data Mart Designer, depending on whether you are reading information directly from a source database or using information read by the Designer component of the suite. For this example, select DATABASE. Service Name: The service name for the source database. In this case study, the same database holds both the source and target schemas. Enter dmdb. Database: Because this exercise uses an Oracle source, leave this blank. For sources other than Oracle, you enter the database name. User Name: The database user ID for the user accessing the source database. Enter sampleoltp. Password: The password for the user accessing the source database. Enter sampleoltp. Database User: This field fills automatically when you enter the User Name. It has the same value as User Name.
3.
Check the box for Auto Create MetaView, which automatically creates the corresponding MetaView, and enter the name GCC_METAVIEW.
If you choose not to do this, you can create the MetaView manually after creating the BaseView.
4. 5. 6.
To specify the tables to include in the BaseView and MetaView, click the Tables button. In the Database Tables list box that appears, select the tables owned by sampleoltp. These table names are prefaced with the name SAMPLEOLTP. Click OK to create the BaseView and MetaView.
Oracle Data Mart Builder creates a new BaseView named GCC_DM and a MetaView named GCC_METAVIEW. The Part Bin displays all tables in the GCC_ METAVIEW. The tables are called categories; the columns are called parts. When you successfully create the BaseView, the BaseView Editor displays the tables in the BaseView, along with the referential integrity relationships, as shown in the following figure:
If you want to include tables from an Oracle Data Mart Designer module in your BaseView, select Oracle Data Mart Designer from the Data Dictionary Type list. In the Oracle Data Mart Designer login dialog box, enter the following information:
I
For User Name, enter yves. For Password, enter yves. For Service Name, enter dmdb. For Application Name, enter OLTP_SOURCE.
5-24
Representing Target Data
For Application Version, enter 1. For Database Definition, enter DEFAULT_DATABASE. For Database User, enter yves.
Click OK to close the dialog box. By default, all tables that can be accessed by the user are displayed. This includes both tables owned by the user and tables on which the user has access permissions. You can choose to add only a subset of these tables to the BaseView by selecting the required subset as you did in the previous exercise. By default, referential integrity relationships (joins in Builder terminology) in the source schema are reflected in the BaseView. However, if you want to define additional relationships, such as between tables from different schemas, you can use the BaseView Editor to add joins.

Now, you create the BaseView that represents the target database schema in the format expected by Oracle Data Mart Builder. For the case study, you store this representation in the same DMDB database that holds the target star schema. The BaseView is stored in the repository.
Creating a BaseView for Target Data

To create the BaseView for the target data, take these steps:
1. 2.
Click the New BaseView button in the toolbar at the top of the BaseView Editor window. Specify the following in the Define BaseView dialog box:
I
BaseView Name: A unique name for the BaseView. Enter GCC_STAR. Database Type: Enter ORACLE. Data Dictionary Type: Enter DATABASE. Service Name: The service name for the source database. Enter dmdb because the same database holds both the source and target schemas. Database: Leave this blank because the target is always an Oracle Data Mart.
User Name: The database user ID for the user who owns the tables in the target schema. Enter marty. Password: The password for the user accessing the target database. Enter marty. Database User: This field fills automatically when you enter the User Name. It has the same value as User Name.
Do not check the Auto Create MetaView box, because you do not need a MetaView for GCC_STAR.
3.
The target BaseView should contain only tables that you want to load with data. Click the Tables button and, in the Database Tables list box that appears, select all tables owned by user marty, that is, all tables that begin with MARTY, except for MARTY.STAGE1. If some or all of the target tables do not exist yet (perhaps because you plan to create them with the SQL*Loader Transform), you can add the tables to the target BaseView later. From the BaseView Editor toolbar, select the Add Table icon.
4.
Click OK to create the BaseView.
5-26
Creating a BaseView and MetaView for the Staging Area

Because this case study uses a staging table to minimize impact on the production database, you need to create a BaseView and MetaView for the staging area. To do so, take these steps:
1. 2.
Click the New BaseView button in the toolbar at the top of the BaseView Editor window. Specify the following in the Define BaseView dialog box:
I
For BaseView Name, enter GCC_STAGING. For Database Type, enter ORACLE. For Data Dictionary Type, enter DATABASE. For Service Name, enter dmdb. For Database, leave blank. For User Name, enter marty. For Password, enter marty. For Database User, accept the automatically generated value.
Check the Auto Create MetaView box and enter the name GCC_ STAGING_MV.
3.
The target BaseView should contain only tables that you want to load with data. Click the Tables button and, in the BaseView Tables list box that appears, select the following tables:
MARTY.CHANNELS MARTY.CUSTOMERS MARTY.DAYS MARTY.PRODUCTS MARTY.STAGE1
4.
Click OK to create the BaseView and MetaView.
Now, you need to specify the join relationships between the staging table and the dimension tables in the BaseView:
1. 2.
In the BaseView Editor, select GCC_STAGING from the BaseView drop-down list. Click DATE_DESC in the DAYS table and while holding the mouse button, drag the column to the ORDERED_DATE column in the STAGE1 table and release the mouse button. Builder draws a line signifying a join between the two columns. Click CHANNEL_ID in the CHANNELS table and while holding the mouse button, drag the column to the CHANNEL_ID column in the STAGE1 table. Click ITEM_ID in the PRODUCTS table and while holding the mouse button, drag the column to the ITEM_ID column in the STAGE1 table. The join between CUSTOMERS and STAGE1 consists of two columns from each table and is called a concatenated join. To create a concatenated join:
a.
3. 4. 5.
Click SHIPTO_NUM in the CUSTOMERS table and while holding the mouse button, drag the column to the SHIPTO_NUM column in the STAGE1 table. Click ACCOUNT_ID in the CUSTOMERS table and while holding the mouse button, drag the column to the ACCOUNT_ID column in the STAGE1 table. Using the Control key and the mouse, select both joins between CUSTOMERS and STAGE1 and click the Concatenate join button (the third button from the right on the toolbar).
b.
c.
5-28
Extracting and Transforming the Source Data
The BaseView now looks like this:
Without the explicit joins defined in the BaseView, you would have to use a Join Transform, which could slow plan processing, or manually add the join condition in the SQL Query Transform.
6.
Close the BaseView Editor.

After you create the source, staging, and target views, you create and run plans that extract and transform the data from the normalized structure in the source into a star schema structure in the target. The number of plans you create depends on the number of dimension tables in the target database.
Using the Data Flow Editor to Create and Run Data Flow Plans
Before you create the different plans to populate the data mart, you should understand how to use the Data Flow Editor to create and run a data flow plan.
Displaying the Plan and Results

In Oracle Data Mart Builder, you can view the data flow plan you create, as well as the result set this plan generates.
The parts are listed in the left pane (the Tree Control), the data flow plan is displayed in the upper right pane (the Data Flow Editor), and the result set is displayed in the lower right pane (the Workspace), as shown in the following figure:
Selecting the Tables to Use in the Plan

The Tree Control displays the Part Bin, which contains categories (tables) and parts (columns) belonging to a selected MetaView. You can click the searchlight button to display a list of other available MetaViews, then click the MetaView whose tables you want to use in building your data flow plan. If you drag and drop these tables and columns into the lower pane, the Workspace displays the column names in a spreadsheet. In the upper pane, the Data Flow Editor displays a SQL Query Transform that represents the data source and a Grid Transform that represents the data sink. You can double-click the SQL Query Transform to bring up the Query Editor. In the Query Editor, you can view the generated SQL and remove parts from the SQL query. With your cursor positioned on the SQL Query Transform, you can click the right mouse button to display a menu with more options. For example, you can rename the SQL Query to create a custom transform, or you can edit the generated SQL.
5-30
Adding Transforms to the Plan

You can add other transforms to the plan by clicking the Tool Bin icon to display the list of transforms in the Tree Control, then dragging a transform into the existing data flow plan. Position the transform icon between the correct input and output for what you want to do. Oracle Data Mart Builder automatically redraws the connectors (representing data flow between input and output) to accommodate the newly added transforms.
Creating Data Flow Plans to Populate the Target Star Schema

To populate the Global Computing star schema, you create the following plans:
I
A plan to load the Time dimension table, DAYS One plan to populate each of the other dimension tables: CUSTOMERS, PRODUCTS, and CHANNELS A plan to load the staging table, STAGE1 A plan to load the fact table, SALES
To guarantee proper key creation, you should populate the dimension tables before you populate the fact tables. For each plan, first use the Grid Transform as the sink to verify the data extracted. Then, if you are satisfied with the result set, replace the Grid Transform with the SQL*Loader Transform to load the table. This approach works well for the dimension tables, but because the staging and fact tables are relatively large, you might want to limit the number of rows displayed in the Grid for these tables. The transforms you include in each plan depend on the type of table you load. Our case study uses the following transforms:
I
SQL Query Time Generation Key Generation Fixed Length File Source Delimited Text File Source Grid Rename Columns SQL*Loader
For most of the plans discussed in the next sections, you use the parts (tables) and categories (columns) of the MetaView GCC_METAVIEW in constructing the SQL Query to extract source data.
Loading the Time Dimension Table DAYS

You use the Time Generation and Key Generation Transforms to populate the time dimension table. Recall that the time generation is different from other dimensions because it is populated only with generated datayou do not load data from a source table to populate the time dimension table. In this exercise, you create a time dimension for daily data. To populate the time dimension table, you create a data flow plan ending with the SQL*Loader Transform, as shown in the following figure:
Take these steps:

1.
Click the Tool Bin icon to list the available transforms in the Tree Control, then drag the Time Generation Transform into the Data Flow Editor.
5-32
2.
Double-click the Time Generation Transform to open the Time Generation dialog box, shown in the following figure. The dialog box is divided into input and output settings.
3.
Input settings specify the start date and the duration. The time span for which data is loaded into the Global Computing data mart is January 1, 1993 to December 31, 1996. Select the Start Date option and set it to 1/1/93. Set the Duration to 1461 days (because 1996 is a leap year). One record is generated for each day of the duration value. Select the output settings and enter the column names shown in the preceding figure. The column names must correspond to the column names in the target table MARTY.DAYS. Click OK. Drag the Key Generation Transform into the Data Flow Editor and drop it to the right of the Time Generation Transform. Oracle Data Mart Builder automatically draws connectors between transforms as needed.
4.
5. 6.
7.
Double-click the Key Generation Transform to open the Key Generation dialog box. To set the column to be populated with the warehouse key value, type DATE_ID in the Key Column box. Select Start Key Value At and enter the value as 1. Click OK.
This adds a key column to the data flow plan and generates a warehouse key sequence for the DAYS table.
8. 9.
Drag the Grid Transform into the Data Flow Editor and drop it to the right of the Key Generation Transform to designate the data sink. Click Update at the bottom of the Oracle Data Mart Builder window to populate the Grid.
You can view the generated values in the Grid.
If you are satisfied with the results, replace the Grid Transform with the SQL*Loader Transform:
1. 2. 3.
Delete the Grid Transform (right-click the Grid Transform and select Delete Step). Drag the SQL*Loader Transform from the Tool Bin into the Data Flow Editor to the right of the Key Generation Transform. Double-click the SQL*Loader Transform to open the dialog box. In the SQL*Loader tab, enter the following values:
5-34
For BaseView, select GCC_STAR. For Table, select MARTY.DAYS. For Oracle Version, select Version 8. Because the table is already created, select Use Existing Table. For Update Method, select Insert. For Load Options, select Use Conventional Path Loader. Because the dimension tables are not very large, you do not use direct path load. For more information about when to use conventional path load or direct path load, see the Oracle8 Utilities manual.
For Batch Size, specify 100. If you specify 0, the SQL*Loader Transform uses the SQL*Loader utility default of 64. For Log File, enter a full file specification, such as:
c:\temp\days.log
It is good practice to use a log file to record all errors for the load process.
4. 5.
Click OK to close the dialog box. Click Update to run the plan and populate the DAYS table.
Builder generates a warehouse key for the DATE_ID column and values for all specified output columns.
6.
Save this plan so that you can run it again when you need to generate updated or additional records for the DAYS table. Right-click in the Data Flow Editor, select Rename plan and type a new name, and then select Save Plan from the File menu.
Creating Plans for Other Dimension Tables

A dimension table has a single primary key, and contains detail information for columns in the fact table. You need to populate the following remaining dimension tables in the Global Computing star schema:
I
CUSTOMERS (Customer Dimension) CHANNELS (Channel Dimension) PRODUCTS (Product Dimension)
Loading the Customer Dimension Table CUSTOMERS

To populate the customer dimension table, CUSTOMERS, take the following steps:
1. 2.
From the File menu, select New Plan. If the parts are not displayed in the Tree Control, click the Part Bin icon. To list the parts, click the MetaView searchlight and select GCC_METAVIEW.
3.
Double-click a category (table) to expand it and view the parts (columns). Drag the following parts into the Workspace (the lower pane):
5-36
Categories (Tables) SAMPLEOLTP.ACCOUNT SAMPLEOLTP.SEGMENTS SAMPLEOLTP.SHIPTO
Parts (Columns) ACCOUNT_ID AC_NAME SEGMENT_ID SEGMENT_DESC SHIPTO_NUM SH_LOCTYPE_DESC SH_DESCRIPTION
A data flow plan appears in the Data Flow Editor. It consists of the SQL Query Transform and a Grid Transform.
4.
Click the Tool Bin icon to display the transforms. Add the Key Generation Transform from the Tool Bin to the data flow plan so that it looks like the following figure:
5.
Open the Key Generation Transform dialog box and type CUSTOMER_ID to set the column to be populated with the warehouse key value. Select Start Key Value At and enter the value as 1. Click OK.
This adds a key column to the data flow plan and generates a warehouse key sequence for the CUSTOMERS table.
6.
Because some of the columns in the target table have different names than the columns in the source table, drag the Rename Columns Transform into the Data Flow Editor to the right of the Key Generation Transform. Open the Rename Columns Transform. In the New Column Name column, click twice on the following columns. Be sure to pause between clicks, so that the fields become editable, then enter the names as listed:
Rename from: AC_NAME SH_DESCRIPTION SH_LOCTYPE_DESC Rename to: ACCOUNT_DESC SHIP_TO_DESC LOCTYPE_DESC
7.
The Rename Columns dialog box now looks like this:
8. 9.
Click OK to close the Rename Columns dialog box. Select Update to populate the Grid. SQL*Loader Transform:
a. b.
10. If you are satisfied with the results, replace the Grid Transform with the
Delete the Grid Transform (right-click the Grid Transform and select Delete step). Drag the SQL*Loader Transform into the Data Flow Editor, to the right of the Rename Columns Transform.
11. Open the SQL*Loader dialog box and enter the following values:
I
For BaseView, select GCC_STAR. For Table, select MARTY.CUSTOMERS.
5-38
For Oracle Version, select Version 8. Because the table is already created, select Use Existing Table. For Update Method, select Insert. For Load Options, select Use Conventional Path Loader. For Batch Size, take the default. The SQL*Loader Transform uses the SQL*Loader utility default of 64. For Log File, enter a full file specification, such as:
c:\temp\cust.log
12. Click OK. 13. Click Update to run the plan and load the CUSTOMERS table. 14. Save the plan by right-clicking in the Data Flow Editor, selecting Rename plan
and typing a new name, and then selecting Save Plan from the File menu.
Loading the Product Dimension Table PRODUCTS

The product dimension table derives all values from the flat file PROD.FIX. Because the records in the file are fixed-length records, you use the Fixed Length File Source Transform. The data file is located in the following directory:
By default, <oracle_home> is located in the directory <drive>:\orant. To populate the PRODUCTS table, take these steps:
1. 2. 3.
From the File menu, select New Plan. From the Tool Bin, select the Fixed Length File Source Transform and drag it into the Data Flow Editor. From the Tool Bin, select the Grid Transform and drag it into the Data Flow Editor. Oracle Data Mart Builder creates the following data flow plan:
4.
Double-click the Fixed-Length File Source Transform to open the dialog box. In this box, you provide information about the data file and details about each field:
I
For File Name on Server, enter <oracle_home>\datamart\prod.fix. Leave blank the User and Password boxes. Because the file is in text format, select the Text radio button for File Type.
5-40
For Character Set, select ANSII.
5.
For Sample Data File Name, enter a file name similar to the following:
<oracle_home>\datamart\prodsamp.fix
This file contains two rows of sample data that can help you fill in the position numbers of the fields.
6. 7.
Add information about each field in the Field List by clicking New. In the Field Specification dialog box, enter the following information:
I
For Column Name, enter ITEM_ID. For Input Data Type, select Character. For Start, enter 1; for End, enter 11. For Output Data Type, select STRING. For Length, accept the generated value of 11.
8.
Click OK.
9.
Repeat steps 6, 7, and 8 for the remainder of the fields, using the following information:
Column Name PACKAGING ITEM_DESC Input Data Type Character Character Start 12 32 67 97 100 120 123 End 31 66 96 99 119 122 142 Output Data Type STRING STRING STRING INTEGER STRING INTEGER STRING 20 20 Length 20 35 30
ITEM_SOURCE Character FAMILY_ID FAMILY_DESC CLASS_ID CLASS_DESC Character Character Character Character
10. Click OK to close the Fixed Length File Source dialog box. 11. Add the Key Generation Transform from the Tool Bin to the data flow plan so
that it looks like the following figure:
12. Open the Key Generation Transform dialog box and type PRODUCT_ID to set
the column to be populated with the warehouse key value. Select Start Key Value At and enter the Start Key Value as 1. Click OK.
5-42
This adds a key column to the data flow plan and generates a warehouse key sequence for the PRODUCTS table.
13. Click Update to run the plan. Now you can view the generated keys, in column
PRODUCT_ID, in the grid:
14. After you view the transformed data for PRODUCTS, replace the Grid sink
with the SQL*Loader Transform to load the table.

15. Double-click the SQL*Loader Transform to open the dialog box. Enter the
following values:
I
For BaseView, select GCC_STAR. For Table, select MARTY.PRODUCTS. For Oracle Version, select Version 8. Because the table is already created, select Use Existing Table. For Update Method, select Insert. For Load Options, select Use Conventional Path Loader. For Batch Size, take the default. The SQL*Loader Transform uses the SQL*Loader utility default of 64. For Log File, enter a full file specification, such as:
c:\temp\prod.log
16. Click OK. 17. Click Update to run the plan and populate the table. 18. Save the plan by right-clicking in the Data Flow Editor, selecting Rename plan
Loading the Channel Dimension Table CHANNELS

You build the channel dimension table from the flat file channel.csv. Because the fields in the file are delimited by separators, you use the Delimited Text File Source Transform. The data file is located in the following directory:
To populate the CHANNELS dimension table, take these steps:

1. 2. 3.
From the File menu, select New Plan. From the Tool Bin, select the Delimited Text File Source Transform and drag it into the Data Flow Editor. From the Tool Bin, select the Grid Transform and drag it into the Data Flow Editor. The following data flow plan appears:
4.
Double-click the Delimited Text File Source Transform and enter the following information:
I
For File Name on Server, enter <oracle_home>\datamart\channel.csv. For Character Set, select ANSII. For Separator Character, select the comma (,). For Quote Character, select the double quotation mark (). For Comment Character, select the number sign (#). Because the first line of the data file contains the column names, check the box Column Names in File. To have the transform automatically define the column names and data types, type the following file specification in the Sample Data File Name box:
<oracle_home>\datamart\channel.csv
Because the data file is small, you can use it as a sample file. If your data file is large, create a small subset of it to use as a sample file.
5-44
Then, in the Auto-Populate Field List box, click Column Names and Data Types. Builder adds information about the columns to the Field List.
If your data file does not contain the column names, you can add the names and data types manually.
5. 6. 7.
Click OK. Click Update to populate the Grid. If you are satisfied with the results, replace the Grid Transform with the SQL*Loader Transform:
a. b.
Delete the Grid Transform (right-click the Grid Transform and select Delete step). From the Tool Bin, drag the SQL*Loader Transform into the Data Flow Editor.
8.
Open the SQL*Loader dialog box and enter the following values:
I
For BaseView, select GCC_STAR.
For Table, select MARTY.CHANNELS. For Oracle Version, select Version 8. Because the table is already created, select Use Existing Table. For Update Method, select Insert. For Load Options, select Use Conventional Path Loader. For Batch Size, take the default. The SQL*Loader Transform uses the SQL*Loader utility default of 64. For Log File, enter a full file specification, such as:
c:\temp\chan.log
9.
Click OK.
10. Click Update to run the plan and load the table. 11. Save the plan by right-clicking in the Data Flow Editor, selecting Rename plan
5-46
Populating the Fact Table SALES

Now that you have loaded the dimension tables, you can load the data into the fact table. Typically, you populate a fact table with numeric metrics of the business or business transactions from tables in the OLTP database. Natural keys in the source database are replaced in the fact table with warehouse keys from the dimension tables. Quantitative or factual data from nonkey columns in the source database are loaded into the fact table. Table 51 shows the fields in the fact table and the source of the data, including the derived field, MARGIN.
Table 51 Fields in the Fact Table
Fields in Fact Table Source of Data SALES CHANNEL_ID CUSTOMER_ID PRODUCT_ID DATE_ID UNITS SALES COST MARGIN
Warehouse key lookup on CHANNELS, and, indirectly, the flat file Warehouse key lookup on CUSTOMERS and, indirectly, SAMPLEOLTP.ORDER_HEADER.SHIP_TO_ID Warehouse key lookup on PRODUCTS and, indirectly, SAMPLEOLTP.ORDER_HEADER.ITEM_ID Warehouse key lookup on DAYS and, indirectly, SAMPLEOLTP.ORDER_ HEADER.ORDERED_DATE Populated from SAMPLEOLTP.ORDERLINE.UNITS Populated from SAMPLEOLTP.ORDERLINE.SALES Populated from SAMPLEOLTP.ORDERLINE.COST Derived as: SAMPLEOLTP.ORDERLINE.SALES - SAMPLEOLTP.ORDERLINE.COST
To minimize the impact on the production database, in this case study you download the data from the production database without applying any transforms and load it into a staging table. A staging table is a table that serves as a temporary holding area. It is stored on the data mart server, but it is not part of the star schema. Then, you move the data from the staging table to the fact table, using any transformations, such as calculated columns or warehouse keys, needed in the process. To populate the fact table using warehouse keys and a staging table, you take these steps:
1. 2.
Load the data, including the natural keys, into the staging table, STAGE1, from the source data. Load the fact data from the staging table and the warehouse keys from the dimension tables into a plan to populate the fact table.
To load the data into the staging table, take these steps:
1. 2.
From the File menu, select New Plan. Click the Part Bin icon to display the parts in the GCC_METAVIEW. Select the following columns from the Part Bin and drag them into the Workspace:
I
ORDER_HEADER.ORDERED_DATE ORDER_HEADER.ACCOUNT_ID ORDER_HEADER.SHIPTO_NUM ORDER_HEADER.CHANNEL_ID ORDER_LINE.ITEM_ID ORDER_LINE.UNITS ORDER_LINE.SALES ORDER_LINE.COST
The following data flow plan appears:
3.
Create the calculated column MARGIN by taking these steps:

a. b. c. d. e. f.
Select the category SAMPLEOLTP.ORDER_LINE and right-click. Select Calculated Field. The Properties dialog box appears. Double-click Name, and enter MARGIN. Double-click Expression, and enter SALES-COST. Double-click Data Type, and select Long Floating Point Number. Close the Properties dialog box. From the Part Bin, drag the MARGIN part into the Workspace.
5-48
You can create the calculated fields using the calculator. If you do so and you use the Batch Loader Transform, you must use the Rename Columns Transform to rename the column so that it can be seen in the Batch Loader. Place the Rename Columns Transform immediately before the Batch Loader Transform.
4.
Because this is a relatively large table, limit the number of rows that Builder will display in the Grid by taking the following steps:
a. b. c.
Position your cursor in the Grid and right-click. Select Limit Rows Displayed. In the Maximum Result Rows to Display dialog box, click Limited and specify 20 for Rows. Click OK.
5. 6.
Click Update to populate the Grid. If you are satisfied with the results, replace the Grid Transform with the SQL*Loader Transform:
a. b.
Delete the Grid Transform (right-click the Grid Transform and select Delete step). Click the Tool Bin icon to list the transforms and then drag the SQL*Loader Transform into the Data Flow Editor.
7.
I
For BaseView, select GCC_STAGING. For Table, select MARTY.STAGE1. For Oracle Version, select Version 8. Because the table is already created, select Use Existing Table. For Update Method, select Insert. Check the Use Direct Path Loader and the Unrecoverable boxes to improve performance. For Log File, enter a full file specification, such as:
c:\temp\stage1.log
8. 9.
Click OK. Click Update to run the plan and load the STAGE1 table.
10. Save the plan by right-clicking in the Data Flow Editor, selecting Rename plan
and typing a new name, and then selecting Save Plan from the File menu. Now, you create a plan that takes the warehouse keys from the dimension tables and fact data from the staging table to populate the SALES fact table:
1. 2.
From the File menu, select New Plan. Click the Part Bin icon and select the GCC_STAGING_MV MetaView. Then, drag the following parts (which represent warehouse keys in the dimension tables) into the Workspace:
I
PRODUCTS.PRODUCT_ID CUSTOMERS.CUSTOMER_ID DAYS.DATE_ID CHANNELS.CHANNEL_ID
3.
Select the following parts from the staging table:

I
STAGE1.UNITS STAGE1.SALES STAGE1.COST STAGE1.MARGIN
Double-click the SQL Query icon and then select the SQL Viewer tab to see the syntax that Data Mart Builder uses to extract the data from the staging table. Builder creates the WHERE clause based on the joins you created in the
5-50
BaseView in Creating a BaseView and MetaView for the Staging Area on page 5-27.
4. 5.
Click OK to close the window. Because this is a relatively large table, limit the number of rows that Builder will display in the Grid by taking the following steps:
a. b. c.
Position your cursor in the Grid and right-click. Select Limit Rows Displayed. In the Maximum Result Rows to Display dialog box, click Limited and specify 20 for Rows. Click OK.
6. 7.
Click Update to run the plan. The Grid displays the transformed fact table data. If you are satisfied with the results, replace the Grid Transform with the SQL*Loader Transform:
a. b.
Delete the Grid Transform (right-click the Grid Transform and select Delete step). Click the Tool Bin icon to list the transforms and then drag the SQL*Loader Transform into the Data Flow Editor.
8.
I
For BaseView, select GCC_STAR. For Table, select MARTY.SALES. For Oracle Version, select Version 8. Because the table is already created, select Use Existing Table. For Update Method, select Insert. Check the Use Direct Path Loader and the Unrecoverable boxes to improve performance. For Log File, enter a full file specification, such as:
c:\temp\sales.log
9.
Click OK.
10. Click Update to run the plan and load the SALES table.
The SALES table is now populated with data from the source database, and its concatenated key is created from primary keys in the target dimension tables. Save the plan by right-clicking in the Data Flow Editor, selecting Rename plan and typing a new name, and then selecting Save Plan from the File menu. Name the plan SALES_PLAN. When you repopulate the tables in the star schema, run the plans from this population example in the following order:
1. 2. 3. 4. 5. 6.
Day Dimension Plan Customer Dimension Plan Product Dimension Plan Channel Dimension Plan Staging Table Plan Sales Fact Plan
5-52
Analyzing the Table SALES
Reenabling Constraints
When you use the SQL*Loader Transform in the direct path mode, the SQL*Loader disables referential integrity constraints. You must reenable the constraints to maintain the integrity of your data. To do so, take these steps:
1. 2. 3. 4. 5. 6.
From the Oracle Enterprise Manager menu, select Schema Manager. Log in as user marty, password marty, and service dmdb. Expand the folder Tables, then expand the folder marty and select SALES. In the property sheet, select the Constraints tab. Schema Manager uses a check mark to indicate disabled constraints. Click each check mark to change it into an X. Click Apply.
Analyzing the Table SALES

To generate the statistics used by the cost-based optimizer, you analyze the table SALES. Invoke the SQL Worksheet application from the Oracle Enterprise Manager menu and log in as user marty with password marty and service dmdb.
What Have You Learned and What Is Next?
In the lower window, type the statement shown in the following figure and select Execute from the Worksheet menu to execute the statement:

In this chapter, you learned how to use Data Mart Builder to extract the data from the data source, map it to the data mart schema, transform it, and transport it into the data mart schema. There is one more step that this chapter did not coverdetermining how the data in the data mart will be refreshed to reflect changes in the source data. Updating Data in the Data Mart on page 8-28 describes that step. In the next chapter, you will learn how to set up typical queries and reports so that your users can access the data easily.
5-54
6
The true measure of success of your data mart project is the value and utility of the data to the end users. Regardless of the technical expertise applied to the development of the data mart, if each end user cannot access the necessary information, the data mart is a failure. At this point, you have built the database, and you want to let users explore the data and get information to support their decisions. If you understood the business problems and have successfully translated that understanding to the star schema, it will be easy for users to create and generate meaningful reports. Most importantly, you want every user to feel as though the interaction with the data is in appropriate business terms, not in unfamiliar database or system terms. First, this chapter describes Oracle Discoverer, reviews the fundamentals of querying and reporting, and reviews some of the terms that Oracle Discoverer uses. Then, it steps through a typical exploration of your data.
Reviewing Your Data Model

In Chapter 3, you took careful stock of the business needs in order to create the star schema. The concepts, dimensions, and facts are the fundamental elements in reporting. The star schema that you used to build the database is specifically designed to facilitate report generation. Data is arranged in business terms for ad
6-1
What Is the Role of Oracle Discoverer in the Data Mart?
hoc access with the key numeric metrics or facts contained in the main table and the textual information (dimensions) in smaller tables describing the main table.

Oracle Discoverer is an easy-to-use decision-support tool that lets you query, analyze, and report data according to your business requirements. Oracle Discoverer reduces the complexity of managing many tools by providing a single tool and one interface that supports rapid queries and reporting analysis. Oracle Discoverer is the query, reporting, exploration, and Web publishing tool that enables business users at all levels of the organization to gain immediate access to the information from data warehouses, data marts, or OLTP systems. Oracle Discoverer provides superior ease-of-use and unsurpassed performance to data access. Oracle Discoverer consists of three components, each with a unique role:
I
User Edition Administration Edition End User Layer
6-2
Through these three components, Oracle Discoverer provides ad hoc reporting and analysis for the user (the User Edition), translates database code to business terms through a metalayer (the End User Layer), and provides an easy-to-use interface for managing user areas (the Administration Edition). Oracle Discoverer also provides built-in features to optimize response to user queries. Oracle Data Mart Suite also provides an Internet interface to Oracle Discoverer dataDiscoverer Viewer for the Web. With Discoverer Viewer, users can view and manipulate reports and graphs using a Web browser interface.
Discoverer User Edition

Discoverer User Edition executes queries and generates reports and graphs. A wizard guides you through the process of defining each report and graph. The iterative process of ultimate report definition and generation is easily executed; you can preview your report layout before the query is executed and change it as needed. Each user accesses data through a business area that the administrator sets up for the specific reporting needs of each user. Discoverer User Edition provides read-only access to the database.
6-3
Basic Steps in Designing a Business Area
Discoverer Administration Edition

Discoverer Administration Edition enables you, as data administrator, to create, maintain, and administer data and the users interaction with that data. Through Discoverer Administration Edition, you write to the End User Layer, specifying business areas for each logical function within the business. You can sculpt and format data to define a unique look for each user group. The responsibility of the data administrator is to hide the complexity of the database from the end user. Administrator functions also include:
I
Defining business visualization of the data Controlling user access Defining drill paths Defining summary tables
End User Layer

So that end users can navigate the data easily, you should hide the complexity of the database and underlying structures. The End User Layer (EUL) translates database terms to business terms. The translations provided by the EUL enable end users to interact with the online data dictionary using terms related to their business function. As a result, the end users do not need to know anything about the database to access it and can focus on business issues. The business areas, which are created by the Oracle Discoverer administrator, are stored in the EUL tables.
Discoverer Viewer for the Web

Discoverer Viewer for the Web provides an Internet interface for viewing Discoverer reports and graphs through a Web browser. Because Discoverer Viewer displays data that is stored in the database, users see up-to-date data. Users can also drill up or down into the data and specify what data they want to see. Because Discoverer Viewer uses a Web browser interface, you deploy software on only one machine, but all users can access Discoverer Viewer.

Simply stated, a business area is a logical grouping of data that applies to specific data requirements of a user group. Although some of the data needed by different
6-4
departments may be the same, the exact combination of tables and views for each department is usually distinct.
Creating and Managing a Business Area

Using Discoverer Administration Edition, you tailor the grouping of data to provide users with the proper access to the precise data they need for ad hoc queries, decision support, and presentation of results. Business areas are designed to meet the data needs of the users. Data comes from different tables or views and is mapped to folders and items, as shown in the following figure. Folders can be used in more than one business area and do not have to be related to each other. Business areas can be allocated to many user IDs or roles; a user ID or role may be granted access to many business areas.
Business areas include conditions, joins, calculated items, formatting, hierarchical structures, and other custom features. Thus, a business area is a cohesive set of folders, items, and other functions designed specifically for professional business people so they can more effectively use the data in their company databases. The time invested initially to create a well-designed business area will have a significant payback in reduced administration and maintenance, because you eliminate or greatly reduce the amount of time spent tailoring the environment for each user at the database level.
Managing End-User Access

The data administrator is responsible for assigning business area access and privileges to users. The data administrator can make business areas available to individual users or to all users.
6-5
Using Oracle Discoverer to Improve Query Performance
The data administrator assigns privileges to users. Privileges include:

I
Administration: Manage Scheduled Workbooks Create Summary Tables Create/Edit Business Areas Format Business Areas Set Privileges
User: Collect Query Statistics Create/Edit Queries Drill Out Grant Workbook Access Save Workbooks to database Item Drill Schedule Workbooks
These privileges are independent of, but do not override, database privileges set directly through the database server.

Oracle Discoverer provides many built-in functions that can improve the performance of queries.
Automatic Summary RedirectionSummary Tables

Administrators can use Oracle Discoverer to create and maintain pre-aggregated summaries in summary tables in the data mart. Summary tables enable rapid response to queries because the database system does not need to recalculate frequently used summaries for each query. Because summary tables greatly enhance database performance, deciding which summary tables to create is an important step in optimizing the database.
6-6
Using a wizard-based interface, the administrator specifies which facts and dimensions to summarize. If summary tables already exist, Oracle Discoverer can identify and utilize them. Oracle Discoverer provides some guidance by recording query activity. To help you decide which summary tables to build, you can view the record of the query activity using the QSTATS workbook, which is provided in the <oracle_home>\discvr31 directory. Oracle Discoverer also simplifies the maintenance of summary tables created through Discoverer Administration Edition by automatically refreshing them at given intervals. When a user summarizes detailed rows, Oracle Discoverer uses algorithms to check if that information has been already summarized. If so, the query engine issues the appropriate SQL and automatically runs the query against the already summarized data. The user does not need to perform any action to redirect the query. This capability provides rapid response times and removes the load from the server.
Query Prediction
Most decision-support tools provide a query governor to cancel long-running queries. Oracle Discoverer extends this capability to give you what you really need: an estimate of the retrieval time before the query execution and the ability to cancel that query. The query predictor enables the user to make sensible decisions on whether to run a query immediately using the nonblocking capabilities, or refer it to a batch process for later calculation.
Query Governor
Oracle Discoverer lets you govern queries in the following ways:
I
Voluntarily declining the query If the query predictor predicts a time longer than a specified time, a dialog box asks if you want to continue and run your query. If you decline the query, your original data is untouched. If you had no data, you retrieve no data.
Automatically terminating the query after given time If the query is not completed within a given time, the query is automatically canceled. In this case, your original data is unchanged. If you had retrieved no data previously, no data is displayed.
6-7
Global Computing Case StudyCreating the Metalayer
Automatically canceling the query after returning a specified number of rows Oracle Discoverer retrieves the number of rows up to the limit. Therefore, if you requested a tabular layout, you could get a partial list. In a crosstab, the results are calculated based on the partial results retrieved. This means that some aggregations could be incorrect. Oracle Discoverer notifies the user when this is the case.
Client-Side Cubic Cache

As data is retrieved from the server, Oracle Discoverer stores it in a client-side cubic cache (the Resultsbase). The Resultsbase enables extremely rapid multidimensional analysis with any number of dimensions and measures, rotation, drill down, drill up, and drill across functionality. An expert SQL query engine dynamically generates performance-optimized SQL queries. The client-side cubic cache makes many operations, such as a pivot or the removal of a label or a column, instantaneous. If the new result can be calculated without sending a new query to the server, the whole step of processing a query and retrieving data is bypassed, thereby reducing overall network traffic.

To give your end users access to the data, you must perform a few procedures using Discoverer Administration Edition. Because Oracle Discoverer contains many features that let you customize the reports that the end users see, you may want to do more than the minimum and make the business area more appealing to the user. This chapter continues to use the Global Computing Company case study as you proceed through the step-by-step process of providing access to the end user for ad hoc query and reporting.
Installing the End User Layer

First, you must install the End User Layer (EUL). The installation process for Oracle Data Mart Suite automatically created a public EUL for the Global Computing Company. You use this EUL when you work with Discoverer User Edition (see "Analyzing Your Data: Discoverer User Edition" on page 6-28). In this section, you create a private EUL for the Administration Edition. Private EULs are accessible only to the user who owns the EUL, unless you grant access to other users. Oracle Data Mart Suite recommends that you have only one public EUL
6-8
for each database instance. Public EULs are accessible to all users. Administrators may want a private EUL on their systems as a development environment. You install an EUL from within the Administration Edition by taking these steps:
1.
From the Program menu, select Oracle Discoverer, then select Administration Edition. If Oracle Discoverer displays the Welcome screen, select Start. (You can disable this Welcome window.) Enter the username system and the password manager. If you install the EUL using an account that does not have the DBA role, as does system, you will not be able to create summary tables or schedule queries to run in batch mode.
2.
3. 4. 5. 6. 7. 8. 9.
For the database connection, enter dmdb. Click Connect. A dialog box asks Do you want to create an EUL now? Click Yes. In the EUL Manager dialog box, select Create an EUL. In the Create EUL Wizard, select Create a new user. Clear the box for Grant access to PUBLIC. Remember that you are creating a private EUL in this exercise. For User, enter samplestar to specify the owner of the EUL. Enter samplestar for the password and password confirmation.
10. Click Next.
6-9
11. For Default Tablespace, select USERDATA. For Temporary Tablespace, select
TEMP.
12. Click Finish. 13. After the EUL is created, which may take several minutes, click OK. Then,
Oracle Discoverer gives you the option of installing the tutorial data. Choose No. You can install it later if you wish.
14. In the next dialog box, choose Yes to connect to the EUL you just created.
The Load Wizard appears when the connection is completed. Note that after you create the EUL, you use the following to connect to the Administration Edition:
I
Username: samplestar Password: samplestar Connect: dmdb
Creating a New Business Area

You create a working area for business users. This area contains the data that is interesting to a particular group, usually defined by a business function or department. In this exercise, you:
I
Identify the location of the database objects that will be loaded into this business area.
6-10
Identify specific objects from one or more databases to include in the business area. Identify user IDs and associated tables to group. You can specify access to private tables or public tables, meaning those tables that belong to a specific user ID or those that are available to every user ID.
Oracle Discoverer provides several options that let you filter the set of tables and views that are displayed. Typically, you use tables that are owned by someone other than the administrator. In this example, you use tables owned by samplestar. The Load Wizard takes you through the process:
1. 2.
In the Load Wizard, select Create a new business area. To identify the location of the database objects that will be loaded into this business area, select On-line Dictionary. The on-line dictionary is the standard dictionary for all Oracle databases.
3. 4.
Click Next. In Step 2 of the Load Wizard, you identify specific objects from the various databases to include in the business area. Select Default Database, then select user samplestar from the list.
Access to the Database 6-11
This user owns the tables and views that are identical to the ones owned by user marty. These tables provide a consistent set of data, in case you made a mistake during the earlier exercises.
You can use the text box at the bottom of the window labeled Load user objects that match to select tables by pattern matching. If you do not remember the full name of the table, you can use Oracle wildcard characters. For example:
I
To find all tables and views beginning with D, enter D%. To find all tables and views ending with AND, enter %AND. To find all tables and views beginning with A and having a four-letter name, enter A???.
You can also apply a filter that reduces the set of tables and views by choosing the Options button.
5. 6. 7.
Click Next. In Step 3 of the Load Wizard, expand the Schema objects list in the Available box by clicking the plus sign (+) next to SAMPLESTAR. Select the tables CHANNELS, CUSTOMERS, DAYS, PRODUCTS, and SALES. You select them using one of the following methods:
I
Click one of the items on the left and drag it to the right. Click one of the items on the left and click the arrow pointing right.
6-12
Select multiple items by holding down the Control key and clicking each item and dragging the items to the right side.
Heres what the window should look like when you finish:
If you make a mistake, you can return an item by selecting the item and dragging it to the left.
8. 9.
Click Next. In Step 4 of the Load Wizard, select the following options:
I
Capitalize names: All folder names appear with initial capital letters; for example, Sales. Replace all underscores with spaces: All folder names appear with spaces rather than underscores. For example, EMPLOYEE_NAME changes to Employee Name. Remove all column prefixes: Removes prefix characters that are the same for all columns. For example, if a table name is EMP and its column names are EMP_Number, EMP_Name, and EMP_Address, the corresponding names in the business area become Number, Name, and Address. For this option to work, the prefix must be the same for all columns of the table, and the prefix must include an underscore (EMP_ not just EMP). Create joins from: Specifies how joins are created. Select Primary/foreign key constraints. With this option, joins are created if columns are related by a primary key and a foreign key constraint in the database dictionary.
If you select Matching column names, joins are created between tables when the names of columns match.
I
Date hierarchies using: All date items automatically use the date hierarchy specified in the drop-down list. Select Default Date Hierarchy from the drop-down list. Default aggregate on datapoints: All aggregates use this type as the default. Select SUM. List of values for items of type: Unique lists of values are generated for all items of the specified type. The following types are selected by default: character, integer, date, and all keys. Make sure the default types remain selected.
10. Click Next. 11. In Step 5 of the Load Wizard, enter the business area name, Marketing Strategy,
in the Name box.

12. You can add a description to the business area to indicate the contents or use of
the area. This description is displayed in the User Edition when users select a business area. Enter Global Sales for 1993-1996 in the Description field.
13. You have now completed selecting the tables and views for the business area.
Click Finish. The Load Wizard creates the business area based on the information you supplied in the previous windows. As it does, it displays a progress bar. The resulting EUL contains all of the information that defines a business area.
6-14
When the Load Wizard has completed creating the business area, it displays a Work Area, including a Task List, as shown in the following figure. The Task List provides a checklist and reminder of activities for the administrator.
Notice the four tabs across the top of the Work Area. Although not mandatory for preparation of the business area for the users, each of the activities represented facilitates the interaction of the business user with the data. You want to make access to the data as easy as possible. After you complete the next step, Granting Business Area Access, you have finished the process of defining a business area. However, if you prefer to make the business area more understandable, you can complete the tasks in Optional Formatting on page 6-17.
Granting Business Area Access

You must grant access rights and privileges to the business area. You determine who can see and use data in the business area. Take the following steps:
1.
From the Tools menu, choose Security.
2.
Select the tab Business Area-> User. Through this tab, you give access rights to users for a particular business area.
3. 4.
From the Business area drop-down list, select Marketing Strategy. From the Available users/roles list box, select the user IDs dmadmin and marty by holding down the Control key and clicking on each user ID and then clicking on the right arrow. (The user samplestar is already listed in the Selected users/roles list box.) You can select individual users or you can specify a role, such as SALESPERSON, which can be assigned to any user ID.
5.
Click OK to save the changes and close the dialog box.
You can grant Oracle Discoverer privileges to users. To grant user marty the privilege to schedule workbooks, take these steps:
1. 2. 3. 4.
From the Tools menu, select Privileges. Select the Privileges tab and select marty from the drop-down list. In the Privileges list box, check the box for User Edition and then check Schedule Workbooks. Click OK.
6-16
Optional Formatting
Oracle Discoverer lets you format your reports to make the business area easier for your users to use. Many dialog boxes have an OK button and an Apply button. Clicking Apply puts the changes into effect, but does not close the dialog box. You can continue making changes to other items using the same dialog box. Clicking OK applies the changes and closes the dialog box. To save changes automatically, select the business area, right-click, and select Properties. Then, check the box Automatically save changes after each edit.
Renaming Folders and Adding Descriptions

You can create meaningful names for each business folder. Folders are the basic elements that the end user sees when working with the business area. For this reason, each folder should have a meaningful name as well as a description that explains the primary use of the folder. To see the folders in the business area, click the plus sign (+) to the left of the Marketing Strategy business area. (When a business area is expanded, click the minus sign () to collapse it.)
In this exercise, you rename the Days folder in the Marketing Strategy business area. To rename the folder, take these steps:
1. 2.
Double-click the Days folder. The Folder Properties window appears. Click in the Name field and enter the new name, Date of Sales.
3. 4.
Click in the Description field and enter the new description, Information about Date of Sale. Click OK.
Renaming an Item
Database columns also often have cryptic names that do not have much meaning for the end user. You can change the names of the database columns to be more meaningful to the user. For example, to change the name of the column Date Desc Year as well as the columns Date Desc Quarter, Date Desc Month, and Date Desc Day, take the following steps:
1. 2.
Click the plus sign (+) next to the newly renamed Date of Sales folder. The list shows all items in the folder. Double-click Date Desc Year to display its Item Properties window.
6-18
3.
Click in the Name field and rename Date Desc Year to Year. Oracle Discoverer automatically changes the Heading entry to Year. Click OK.
4.
Change the name and heading of three other items. To make the changes, double-click the item to display the Item Properties window, select the appropriate field, and enter each change:
I
Date Desc Quarter to Quarter Date Desc Month to Month Date Desc Day to Day
5.
Click OK when you finish making changes to each items properties.
Hiding Items in the Business Area

You can hide the database objects that are irrelevant to the end user. End users do not need to see all items in a business area. For example, you may not want them to see primary and foreign keys and sensitive information, such as pay scale and time-in-service. Also, some items used in calculations need to be in the business area, but do not need to be displayed. Hidden items are not deleted items. Hidden items remain in the business area, but are not visible to the end user. Deleted items are removed from the EUL. In the Marketing Strategy business area, typical items that might be hidden are:
Dimension DAYS CHANNELS PRODUCTS
Folder Date of Sales Channels Products
Item Date Id Channel Id Product Id Family Id Class Id Customer Id Account Id Segment Id Date Id Channel Id Product Id Customer Id
CUSTOMERS
Customers
SALES
Sales
To hide Product Id, take the following steps:

1. 2. 3.
Open the folder, Products, to display its items. Double-click the item to be hidden, Product Id. Its Item Properties window appears. In the Visible to User field, choose No from the drop-down list.
6-20
4.
Click Apply to save the changes before editing the properties for the next item. Click OK to close the dialog box.
Formatting Currency
To apply the same property to more than one item, use the Shift or Control key with the mouse to select the items and display the property sheet. You can format the display of currency by taking these steps:
1. 2. 3.
Expand the Sales folder, and using the Control key and the mouse, select the Sales, Cost, and Margin columns. Right-click and select Properties. Scroll until you see Format mask. For Format mask, enter L999G999G990D99. This mask precedes the value with the local currency sign and uses the default currency separators. Click OK.
4.
Creating a New Item for Calculations

You can define calculated items for use by end users. Typical business calculations include profits, profit margins, average revenues per month, expected sales, and percentage of profit. As the administrator, you can define the calculations that an end user is likely to need. This exercise shows how to produce a new item for the calculation of percentage profit:
1. 2.
Select the Sales folder. Select the New Item button from the toolbar or select Create calculated items from the Administrator Task List. As an alternative, you can choose Item from the Insert menu. (If you try to insert an item without first selecting a folder, Oracle Discoverer prompts you to select a folder.) The New Item window appears. You can click the plus sign (+) to the left of the Sales folder to expand it. To create any calculation, you can do one of the following:
I
Select the items from the column on the left and click the Paste button to paste them directly into the calculation. Type directly in the Calculation field.
To add an operator to the calculation, click the operator buttons below the Calculation field. To see a list of functions you can paste, click the Functions button.
3. 4.
Enter %margin in the Name field. This item calculates the percentage of profit. (Remember, the Margin column is defined as SALES - COST.) Enter (Sales.Margin / Sales.Sales) * 100 in the Calculation field. Alternatively, you can double-click each field and each symbol to add it to the calculation.
5.
Click OK to save the new calculation in the business area and close the dialog box.
Calculations follow the Oracle8 calculation standard syntax. For a full description of the Oracle8 calculation syntax, see the Oracle8 SQL Reference.
6.
Open the Sales folder and double-click the new %margin item to open its Properties window. Set the Default position to Data point and the Default aggregate to SUM. Click OK to close the Properties window.
7.
Defining a Hierarchy
You can define a hierarchy to help users drill down to detail data. For example, you can define a hierarchy for the descriptive columns in the Products table:
1. 2.
From the Insert menu, select Hierarchy. In the Hierarchy Wizard, select Item Hierarchy and click Next.
6-22
3. 4.
In Step 2 of the Hierarchy Wizard, expand the Products folder and select Item Desc, Family Desc, and Class Desc and move them to the box on the right. To order the items in the correct hierarchy, select Products.Item Desc and click the Demote button twice. Select Products.Class Desc and click the Promote button. The dialog box should look like the following:
5. 6. 7.
Click Next. In the Name box, enter Product Hierarchy. Click Finish.
Creating a Join
Data analysis often needs information that resides in more than one folder. For the analysis to occur, the folders must be linked by a join condition. Joins are part of the design of the database and business area. Joins defined in the database become part of the business area when it is created, but the administrator can also create business area joins that are known to Oracle Discoverer. For example, an administrator could create outer joins or non-equijoins. End users cannot create joins in the business area, although they can choose between joins if more than one exists between two folders. To create a join, take the following steps:
1. 2. 3.
Select the Products folder. From the Insert menu, choose Join. The New Join dialog box appears. From the Master Folder drop-down list, select Products.Product ID.
4.
The Operator drop-down list shows the operators that you can use. Select the equal sign (=) from the list. The equal sign represents an equijoin, which combines rows that have equivalent values for the specified items. From the Detail Folder drop-down list, open the Sales folder and choose Product ID.
5.
The master folder has a one-to-many relationship with the detail folder. That is, for each item in the Product folder, there are many entries (rows) in the Sales folder.
6.
If you were creating a new join, you would also name and describe this join and then click OK. However, because this join already exists as part of the EUL, click Cancel.
Defining a New Item Class

You can define a new item class. Consider defining an item class for the following purposes:
I
To allow the user to see a listing of the unique data values in an item. To sort values in a nonstandard fashion. Standard sorts include alphabetical, numeric, and date; a nonstandard sort might be by regionNorth, South, East, West. To allow the user to see related information by hyperdrilling. Hyperdrilling allows jumping from summarized data to detail information. It differs from direct drilling, which allows end users to drill up or down to different levels of summarized data. (Hyperdrilling is also known as drill to detail.)
When you create a business area and choose to automatically generate lists of values, the Load Wizard creates item classes for you.
6-24
The following steps show how to create a new item class for drilling into related detail information from summary data, and for viewing a list of unique values from the Item Desc column:
1.
To start the Item Class Wizard, from the Insert menu, choose Item Class.
2.
Click the check boxes for List of values, Alternative sort, and Drill to detail. List of values is a list of unique values currently in the database column that is associated with this item. Alternative sort lets you sort using a sequence provided by another item. Drill to detail lets a user drill from summary data into detail rows of an associated table for related information.
3. 4. 5. 6.
Click Next. To create a list of detail values from a database column, open the Products folder and select Item Desc. Click Next. To create an alternate sort, select Item Id. The report will be sorted by the data in the Item ID column. Click Next. To select the items that will use this item class, open the Products folder, select Item Source and click the right arrow to move it into the Selected items box. Item Desc is already listed in the box. Items that use the same item class define a relationship between those items. The relationship can be used to determine summary-to-detail drills.
7. 8.
Click Next. Enter a name for this new Item Class, Product Items class.
9.
Click Finish to create the item class.
If a list of values is available for a column, a plus sign (+) appears next to the item in the data tab in the work area. The Item Desc item contains a list of values sorted in an alternate order. To see the list of values, click the plus sign (+) and click Yes to the informational message. When you create an item class that includes an alternate sort order, a one-to-one relationship must exist between the column used for the list of values and the column used for the sort order. In addition, the two items must be in the same folder.
Creating a Condition
A condition limits the view of the data, restricting it to a predefined set of data. To create a condition that limits the data to the Internet and Direct Sales channels, take these steps:
1. 2. 3.
Expand the Channels folder and choose the item Desc. From the Insert menu, choose Condition or from the Administration Task List, select Create condition. Clear the check box for Generate name automatically. Then, enter Internet and Direct Sales Channels in the Name field.
4. 5.
From the Type drop-down list, select Optional. This lets the user choose whether or not to use the condition in a query. Choose the IN operator from the Condition drop-down list.
6-26
6.
To limit Desc to only Direct Sales and Internet, select Select Multiple Values from the Values drop-down list. Then, from the Values box, click Direct Sales and Internet. Click OK and click OK again to close the New Condition dialog box.
7.
Creating a Summary Table

You can use Oracle Discoverer to create summary tables, which hold data that has already been aggregated and joined. Create summary tables to hold the results of frequently run queries that may take a long time to complete their initial run. You want to create a summary table that stores the total sales for each product for each of the years 1993 to 1996. The table is the result of a query run against the table SALES. To create the summary table, take these steps:
1. 2. 3.
From the Insert menu, select Summary. In the Summary Wizard, click From items in the End User Layer. Click Next. From the Available items box, select the following and move them to the Selected Items box:
I
From the Sales folder, Sales From the Date of Sales folder, Year From the Products folder, Item ID
Analyzing Your Data: Discoverer User Edition
Click Next.
4. 5. 6. 7.
Select Year and Item Id for the Combinations. Click Next. Select the time and date when you want to run the summary. Click Finish. In the dialog box that asks Do you want to refresh this summary folder immediately?, click Yes.
When the summary table is created, you can exit Discoverer Administration Edition.

Through Discoverer Administration Edition, you created metadata for the end user. You use Discoverer User Edition to create reports and analyze your data. This section reviews the types of reports that Oracle Discoverer can generate. Creating a Report on page 6-35 describes the details of how you create a report.
Types of Reports
Data selection and format are the keys to understanding the results. You, the business user, know the questions to ask. Understanding and utilizing the results becomes the challenge, and often it is facilitated or hampered by the structure of the report output. First, you need to decide how you want the data displayed. Oracle Discoverer provides four report formats: table, page-detail table, crosstab, and page-detail crosstab. In addition, it lets you convert the reports to a graphical format.
6-28
Table Format
The table format provides a complete list of the data. For example, use the table format to list the total sales by account name by year.
Page-Detail Table Format

The page-detail table format adds a paging dimension, where one dimension becomes the header for the table. For example, you could add the customers city to the page dimension. This allows users to cycle through the data, viewing their customers in each city, one city at a time.
The following page-detail table adds Packaging to the page dimension. Each page reflects the packaging option.
Crosstab Format
The crosstab format is most commonly used to analyze data. In the crosstab format, dimensions appear along either axis, and the facts appear in the intersection. You can arrange multiple dimensions along either axis by grouping one within another. The size of a crosstab report is dependent on the number of different values. For example, gender has two values, male and female, and year has only as many
6-30
values as data. The result is a very compact way of displaying the data. The following chart shows the sales by year by product class by channel:
The key difference between crosstab format and table format is that table format has one column for each dimension and each fact, but crosstab format displays facts in the intersection of the values of the dimensions located around the axes of the page.
Page-Detail Crosstab Format

The page-detail crosstab format is also called master-detail crosstab format. It is a crosstab where one or more dimensions are placed at the top, above the crosstab. For each value in that dimension, you get a separate page. Essentially, this dimension becomes the Z axis, the third dimension. The page-detail crosstab format
is like stapling all these pages together. The user can easily step through the pages by clicking on the pop-up list box.
Graphical Format
A picture is worth a thousand words. Regardless of how good your report is, often a chart or graph can communicate the meaning in the data more effectively. Oracle Discoverer can instantly represent your reported data in a graphical format through various chart types, such as bar graph, area graph, and line-bar. You can display the graph and report simultaneously as you explore your data. The graph dynamically reflects data or orientation changes you make in the report.
Modifying Formats
Oracle Discoverer permits easy switching between various data report formats. Try different reporting formats by using the buttons on the toolbar to change to table or crosstab format. Regardless of how you display the data, it is often easy to miss the outliers in your data. An outlier is an unusually large or small value. Scanning large lists of data can be difficult and error-prone. Oracle Discoverer can highlight these outliers with exception formatting. For any fact, you can define a range of values or a condition such as greater than, and choose a different format to highlight values that meet this condition. The different format options include font, style, and color. After you create a crosstab, you may want to change the orientation of the data for greater readability. You can do this by pivoting, which is moving a column heading
6-32
Investigating Your Data
to become a row heading, or vice versa. Pivoting aids in analyzing your data and makes the layout of the report so flexible that it really does not matter where you start. You can also pivot within the same axis. For example, instead of showing the data by year, subgrouped by quarter, you can pivot to show quarter, subgrouped by year.
Drilling Down and Drilling Up

A drill-down is an advanced analysis feature. Drill-down allows display of a lower level of detail. For example, if you are looking at Total Sales for a product for all of the United States and you want more detail, you can drill down to show sales by North, South, East, and West regions. An additional drill-down gives even more detail, such as states or cities. Drilling requires that a hierarchy be defined. For example, the relationships among United States, Region, and State must be specified. The data administrator manages this definition through Discoverer Administration Edition. A drill-up (or roll up or collapse) is the reverse of a drill-down. It is the return to the higher summary level of data within the hierarchy. Using the geographic example, states drill up to regions that drill up to Total United States.

Ad hoc querying is an iterative process. Oracle Discoverer wizards aid in the creation and refinement of queries. To investigate your data, you:
I
Build a table or crosstab using the wizard Modify a query layout using the wizard View the results in the results window Modify the results in the results window or using the wizard Do it again as needed
Besides the wizards, Oracle Discoverer has a unique set of toolbars designed to make your journey through your data easy and efficient. All functionality offered through a toolbar is also available through the menu.
The following figure shows the toolbar of Discoverer User Edition. The buttons unique to Oracle Discoverer are labeled.
To display all toolbars, select them from the View menu.
Where Do You Begin?

The Workbook Wizard steps you through the selection and formatting of the data you wish to investigate. The process follows this order:
1. 2.
Select the report output format. Select the items to include in the query. Think of the query as Give me [fact] by [dimension] by [dimension].
First, select the facts, then select the dimensions. Start with a few dimensions and add more as needed. It is usually easier to start with a little data and drill down into more data rather than going the other way.
3. 4.
Review the layout created by Oracle Discoverer. You can rearrange the data in the report before the query executes. Apply any filters or conditions (scope) or select only the data that you want to view. For example, to study 1996 sales activity, limit the Days dimension to 1996. Add any calculated items to your query. Oracle Discoverer displays all items available for calculation. Select the items, create the expressions, and format the output. Click Finish. Oracle Discoverer executes your query and returns your requested data in a report format.
5.
6.
6-34
Global Computing Case StudyAccessing the Data

You are the new marketing manager at Global Computing. You must find opportunities to support new business and to expand upon current trends. You need to put together a marketing plan for the year. You have looked at reports, but now you have a data mart that holds Global Computing sales for the past four years. You decide to explore any information that the data may impart using Oracle Discoverer. To begin, log on and connect to the database:
1. 2. 3. 4.
From the Oracle Discoverer program group, select User Edition. If the Welcome screen appears, select Start. For Username, enter dmadmin. For Password, enter manager. For Connect, enter dmdb.world.
5.
Click Connect. Discoverer User Edition connects to the database.
Creating a Report
When you execute a query, Discoverer stores the output report in a worksheet. One or more worksheets and their queries comprise a workbook. When you log in to Oracle Discoverer, the Workbook Wizard starts. With the help of the wizard, you can create a workbook for your reports:
1.
From the Workbook Wizard window, choose Create a new workbook.
2.
To choose the report output format, select Table.
3. 4. 5.
Click Next. The wizard displays Step 2, in which you select the items to include in the query. From the Available drop-down list, select Marketing Strategy as the business area. Select the following items by opening each folder (click the +), then dragging the items from the Available box on the left to the Selected box on the right:
I
From the Sales folder, select Sales. From the Date of Sales folder, select Year.
6-36
From the Channels folder, select Desc.
6.
Click Next to review the layout created by Oracle Discoverer:
7.
Looking at the order in which Oracle Discoverer has arranged your selected data, you decide that it would be helpful to have Year in the first column, Desc in the second column, and Sales SUM in the third column. Drag the Sales SUM header over the Desc header, so that the order is Year, Desc, Sales SUM.
8.
The next steps of the wizard let you define conditions, a sort order, and a calculation, but skip these steps for this exercise. At this point, you just want to see some data, so you click Finish.
The query predictor estimates the amount of time it will take to execute the query. At first, the estimate may be much higher than the actual time. The query predictor refines its estimate as it progresses in executing the query. The report looks similar to the following figure:
You have your first report! You can make it more informative by reformatting it and adding subtotals.
Reformatting a Report
You can make your report easier to read and more informative by changing the format and adding subtotals. To reformat the width of the columns, position your cursor in the column header between columns, so that it displays arrows pointing in both directions. Drag your cursor to make the column wider or narrower. You can also reformat the width of columns by selecting Columns and then selecting Auto-Size from the Format menu.
6-38
You decide to add subtotals for each year:

1. 2.
To suppress the redundant display of the year, select Year and click the Group Sort button from the toolbar. To add the subtotals, select the column header Year and click the Sum Total button (the first button on the Analysis toolbar.) Oracle Discoverer adds subtotals automatically. The subtotal inherits the format of the column.
3.
To format the subtotals for clarity, take these steps:

a. b. c.
Select a Subtotal value. From the Format menu, select Data. Select the Font tab, select Bold, and click OK.
Your report should look like the following screen:
Immediately, you notice the trend in overall sales. Catalog sales peaked in 1994 and are now on a downward trend. Growth has slowed in the Direct Sales channel over the last year though the trend is still upward. And Internet sales have exploded!
You ask, How is our volume? Are our unit sales behaving in the same fashion and following the same trend? Are we still making money or is our profit eroding at the same rate as sales? To answer these questions, you need additional data pointsUnits and Margin.
1. 2. 3. 4. 5.
From the Sheet menu, choose Edit Sheet. (Alternatively, you can select the Edit Sheet button from the toolbar.) From the Edit Sheet Wizard, select the tab Select Items. Click the plus sign (+) next to Sales. Add Cost and Margin from the Available box to the Selected box by selecting them and clicking on the right arrow or dragging them. Click OK. To remove the yearly sums, click the row label for the row SUM and press the Delete key. (The row label is the gray column to the left. In this case, it shows the number 3.)
Quite a lot of data appears on the screen. Although the report is legible and informative, it may be more effective if it is presented in a different format. Crosstab reports are useful when analyzing data, for example, when you want to know how products are sold during one year, with the data divided by quarters. If you want to view the information by channel, you can create a page-detail crosstab report.
Creating a Page-Detail Crosstab Report

To create a page-detail crosstab report, take these steps:
1. 2.
From the Sheet menu, select Duplicate as Crosstab (or select the Duplicate as Crosstab button from the toolbar). A message reports that no data can be displayed because there are no rows. Click OK. The Duplicate as Crosstab window appears, with a default format for the report. You can move items within the sheet by clicking on the label, such as Year, and dragging the item to the desired location, such as to the side.
3. 4.
Create pages by clicking in the check box for Show Page Items. Drag the Desc into the Page Items.
6-40
5.
Drag Year from the top axis to the left axis.
6.
To view only 1995 and 1996, select the Conditions tab. In this step, you put a condition on time where the year equals 1995 or 1996. In Oracle Discoverer language, this is: Year IN 1995,1996.
I
Select Year from the View Condition For drop-down list. Click New. Clear the Generate name automatically box. For Name, enter 95-96. From the Item drop-down list, select Year. From the Condition drop-down list, select IN. In the Value box, enter 1995,1996. Alternatively, you can use Select Multiple Values from the drop-down list.
7.
Click OK, then click OK again.
When your results appear in the report, notice that you can cycle through the Channel Desc in the page axis.
At this point, the report shows the years as row headers. You can drill down to provide more detailed information.
Drilling Down
To provide more detailed information for your report, you can drill down to quarters, by using one of the following methods:
I
Highlight the column or row by selecting the gray bar above or to the right of the column or row. From the Sheet menu, select Drill. Then, select Drill Up/Down and select an item. Use the right mouse button to right-click the column or row. From the pop-up menu, select Drill. Then, select Drill Up/Down and select an item. Highlight the column or row. Move the cursor to the triangle next to 1995 until it appears as a magnifying glass and then click. Choose an item from the drop-down list.
For this exercise, use the first option:
6-42
1.
Select the gray bar above the column Year. All of the years are highlighted. From the Sheet menu, choose Drill. Then, select Drill Up/Down and select Quarter in Year.
2. 3.
Click Options. Select Expanded to include new item, which adds the Quarters to the sheet. Replaced with new item replaces the selected item (Year), with the drilled values (Quarters). Click OK and click OK again. When you see the chart, you decide to rearrange the columns. Rearrange the columns by selecting the column title Cost SUM and dragging it over Sales SUM.
4. 5.
When you cycle in the page axis to Desc: Catalog, the chart looks like the following:
Creating Graphs
Whenever you are investigating trends, a graph can capture and communicate them very effectively.
6-44
To reformat the report into a graph, click the Graph button. Oracle Discoverer displays the New Graph Wizard.
The wizard guides you in formatting your graph:

1. 2. 3.
Choose the type of graph you want, such as a bar graph, area graph, or pie chart. Choose Bar and click Next. Choose how you want to format the graph. Choose 3D and click Next. Click the Show Legend check box. You can title the top, left, bottom, and right axes, but for now, leave them blank. Click Next. You can also format titles by positioning the cursor over the chart and clicking the right mouse button or by using the text button.
4.
From the Graph Series By section, select Column. You can also specify maximums and minimums for the Y Axis Scale and you can pivot the axis by right-clicking and selecting Series by row (or column). Take the defaults.
5.
Click Finish.
A graph of the data appears. To see the full legends at the bottom of the graph, position your cursor at the bottom line of the graph and pull it up.
Now, the graph looks like this:
You can format the graph in many different ways, including:

I
To change colors, drag and drop colors to different areas of the graph from the rainbow color bar across the top of the screen. To display the value of a particular data point on the bottom margin, move the cursor to that data point. To add text, click the letter button.
The graphing capabilities of Oracle Discoverer are flexible and extensive. When you reorient the underlying report and rotate the dimension values around the axis of the spreadsheet, the chart automatically reflects those modifications. In our Global Computing scenario, the chart shows the sales trend very clearly over the last two years, highlighting the downward trend in the Catalog channel. You want to leverage the growth in sales through the Internet, while taking advantage of catalog sales. To develop an effective strategy, you need to know more about who is buying through the Internet and what they are buying. Is Global Computing making money from those items? In other words, how is the margin by product? Are there particular industries or market segments in which the company should invest? Are there any segments that Global Computing should completely ignore? Now that you have been introduced to the basic capabilities of Oracle Discoverer, you can continue to explore the data and reach your own conclusions as you build the marketing plan.
6-46
Drilling to Detail
Sometimes you want to look at a very low level of detail. For example, once you have identified a customer whose purchases have decreased, you might want to call the purchasing agent. You may need to access the specific account information, such as the purchasing manager and the correct phone number. Oracle Discoverer allows you to drill to this detail.
Drilling Out
Oracle Discoverer lets you drill out, calling different applications to display various supporting documents. You can view the Bill of Materials for a product by looking at a Microsoft Word document. Similarly, you can navigate to your online Web page where you can see how the product is viewed on the Web, or to any other Web site on the Internet.
Custom Formatting
You can alter the format of the heading and the data that appears in the report.
I
To rename the sheet, double-click the sheet tab. To format the headings, take these steps:
1. 2. 3. 4. 5. 6.
Select a range of columns by highlighting the heading of the first column, hold the Shift key, and then select the last column in the range. From the Format menu, select Headings. In the Font tab, and select Font Times New Roman, Style Bold, and Size 14. Select a shade of dark red. Select the Background Color tab and select the white box. In the Alignment tab, select Wrap text. Click OK to apply the formats and to close the dialog box.
To format the data, including font style, color, and size, take these steps:
1. 2. 3. 4. 5.
Select the data. From the Format toolbar, select Data. Set the Font to Arial, the Style to Bold, and the size to 12. Choose navy blue as the text color. Select the Background Color tab and choose a light gray.
Publishing Your Work for Others to View

You can export worksheets and workbooks to a variety of formats for others to view. You can publish Oracle Discoverer query results on the Web and use them in other applications.
Exporting to Excel
An Excel button resides on the toolbar in the reporting module. By clicking on it, you copy the data from the sheet you are viewing into Microsoft Excel.
Exporting to HTML
From any worksheet you can export a sheet or your whole workbook to the Web by creating a hypertext markup language (HTML) file. To do so, take the following steps:
1. 2. 3. 4. 5.
From the File menu, select Export. Choose All Sheets. Choose the Hyper-Text Markup Language (*.htm) export format. Select a directory in which to save the export files. Select Finish.
The resulting files are named using the following convention:

workbook_name_sheet_name.htm
If you have a Web browser installed on your machine, you can view these files by double-clicking on them from your browser or by typing the path and the file name in the Address or Location line of your Web browser. The files are linked to each other, letting you click from one sheet to another. You should apply word wrap to all of your headings if you use Netscape Navigator as a Web browser.
Exporting to Other File Formats

Similarly, you can export to other file formats, such as Lotus 1-2-3 and Adobe Acrobat PDF.
6-48
Scheduling a Workbook
You can schedule a workbook to run at regular intervals. For example, to take advantage of the lighter load on weekends, you can schedule a workbook to run every Saturday at 11:00 PM. To schedule Sheet1 of Workbook1, take these steps:
1. 2. 3.
Open the workbook from the User Edition. From the File menu, select Schedule. In the Schedule Workbook Wizard, select the sheet you want to schedule, Sheet1.
You can select all of the sheets in the workbook by clicking Select All.
4. 5. 6. 7. 8. 9.
Enter the time and date you want the sheet to run. To specify how often to run this worksheet, select Repeat Every. Enter 1 in the first box and select Weeks in the second box. Click Next. Enter a name, Sheet1, and a description, Weekly Run---Sheet 1, for the scheduled worksheet. Enter how long you want to save the results of the run. Click Finish.
Because a scheduled report runs on the server, you do not need to leave your client machine running overnight (or whenever you schedule the report to run). The results of running the scheduled report are saved on the server and, therefore, are available when you log on to the server and start Discoverer. You can view the scheduled workbooks and the results of those workbooks by selecting Open from the File menu and selecting Scheduling Manager. Then, select the workbook you want to view.

This chapter demonstrated some of the features of Oracle Discoverer, both for administrators and end users. The data administrator is responsible for hiding the complexity of the database from the end user through the creation and design of a business area. In this chapter, the administrator learned how to:
Define the business visualization of the data Control user access Define drill paths Define summary tables
The end user learned how to:

I
Define and obtain business-specific data from the database Develop ad hoc report queries and reports Change reports to focus on the data relevant to the business Schedule workbooks to run at regular intervals
Chapter 7 shows you how to use Oracle Reports to generate centralized reports for your end users. As the end users become more familiar with the first implementation of the data mart, they will make new requests for expansion and enhancements. You need to develop a strategy to handle increased demand and to manage the ongoing maintenance requirements of the data mart. Chapter 8 discusses managing the data mart. In a short time, the data from the internal systems may not be adequate to meet the user needs. Chapter 9 touches on the value of external data in the data mart and some of the issues in integration of this data.
6-50
7
Report Creation
In Chapter 6, you learned how to use Oracle Discoverer to access the database. Oracle Discoverer is an end-user query and analysis tool that provides powerful data exploration using drill-anywhere capabilities, pivoting, and charting. Oracle Discoverer is highly suited to ad hoc querying. Users can generate ad hoc queries against the data and change the focus of the reports to meet their current needs. However, you may not always need the ad hoc querying and customization abilities of Oracle Discoverer. For example, perhaps you run a sales report at the end of each month. Although the results change from month to month, the question you ask each month is the same. That is, the report structure is fixed. Some consumers of the report are not technologically sophisticated users, but they want up-to-date and attractively formatted reports. For this type of application, Oracle Reports is the tool of choice. This chapter shows how you can quickly create a report using Oracle Reports.
What Is the Role of Oracle Reports in the Data Mart?

Oracle Reports provides an easy-to-use, productive approach to developing and delivering sophisticated reports in a timely manner. The features of Oracle Reports include:
I
A query builder for the graphical specification of SQL statements Online tutorials and wizards that guide users through report design A what-you-see-is-what-you-get (WYSIWYG) report previewer that lets you edit reports in place An integrated chart builder to graphically represent report data
Report Creation
7-1
Global Computing Case StudyBuilding a Report
Comprehensive help
These features dramatically reduce the learning curve for new developers and increase the productivity of experienced staff members, enabling them to quickly and easily create complex sophisticated reports.
Scalability
Oracle Reports takes advantage of the scalability of the Network Computing model. The powerful Reports Server helps you to easily deploy your applications in a multitier environment. Oracle Reports lets you deliver on the promise of the Internet, the intranet, and the extranet and provide the extended reach of those technologies to your users, customers, and suppliers. With Oracle Reports, you can disseminate your database information:
I
In a secure environment that leverages the security of the database Using a Web browser, a thin client, or an ActiveX control Using industry-standard formats like HTML, HTML Cascading Style Sheets (CSS), Adobe PDF, PCL, PostScript, and ASCII
Components of Oracle Reports

Oracle Reports provides a complete environment for the construction and deployment of sophisticated reports. The components include:
I
Report Builder: An easy-to-use, wizard-driven tool for building sophisticated database reports Graphics Builder: A declarative tool for the graphical display of the report data Reports Runtime and Graphics Runtime: Runtime engines for client/server deployment Reports Server: A server that enables multitier deployment of reports

Before you begin, turn for a moment to Chapter 2. In that chapter, you examined the information requirements of the departments that will use the data mart. Among other goals, the Sales and Marketing department wanted to identify the best-selling and worst-selling products in the Global Computing Companys inventory. In this
7-2
chapter, you will learn how to build this report using the Report Wizard and the Live Previewer.
Invoking Report Builder

To invoke Report Builder, select Report Builder from the Developer/2000 R2.1 program group from the Windows NT Program menu. It is not necessary to connect to the Oracle database immediately. Report Builder prompts you to enter the connect information when it needs to access the database. You have not reached that point yet. When you invoke Report Builder, it displays a dialog box that asks where you want to start:
Choose Use the Report Wizard and click OK.
Building a Report with the Report Wizard

The Report Wizard guides you step-by-step in creating a report. It uses what you specify to create a data model and layout for the report.
Report Creation
7-3
When Report Builder starts the Report Wizard, you see the following screen:
To build a report with the Report Wizard, take the following steps:
1. 2.
Bypass the opening screen by clicking Next. In the next screen, enter a title for your report. Enter Best & Worst Sellers.
3.
In the same screen, you can also choose among various report formats. When you select a style, the wizard displays a preview of that style on the left side of the screen. Choose Tabular and click Next.
7-4
4.
The next screen prompts you for the SQL SELECT statement that will form the basis of the report. You can enter the statement directly if you know the syntax, or you can build the statement using the built-in Query Builder.
For this exercise, click Query Builder.

5.
Because the Query Builder needs to connect to the database to provide you with a list of database objects that you can access, it displays the Connect dialog box. Connect using the following:
I
Username: samplestar Password: samplestar Database: dmdb
Click Connect.
6.
After you connect to the database, the Query Builder displays the Select Data Tables dialog box, which shows the list of tables to which you have access. To
Report Creation
7-5
identify the best-selling and worst-selling products, you need to include the PRODUCTS and SALES tables. Double-click on PRODUCTS and SALES.
You can also include a table by selecting it and then clicking Include.
7.
Click Close.
Now, the Query Builder should look like the following screen:
7-6
The line connecting the two tables indicates that the Query Builder is aware of the join columns that link the two tables, based on the foreign key constraint definition stored in the database. To identify the best-selling and worst-selling products, you need to calculate the sum of the number of units sold and the sum of the amount of the sales for each product. To add columns showing the sums to the query, follow these steps:
1. 2. 3.
Click in the SALES table box to highlight the box. In the Query Builder toolbar, click the Define Column button. In the Defined Columns field of the Define Column dialog box, type sum_units.
4. 5.
Place the cursor in the Defined as box and click Paste Function. In the Paste Function dialog box, select the SUM() function.
Report Creation
7-7
6. 7. 8.
Click OK. In the Define Column dialog box, click Paste Column. In the Paste Column dialog box, select the column UNITS.
9.
Click OK. Now, the Define Column dialog box should look like this:
As an alternative to the previous steps, you can type the expression SUM(UNITS) in the Defined as field.
10. Click OK. 11. Repeat steps 2 through 10 to define a new column to show the sum of sales.
Define the column sum_sales to be SUM(SALES).
7-8
When you complete the steps, you should see the two sum columns in the SALES table:
Next, you define the columns by which the query will be sorted. Take the following steps:
1. 2.
In the toolbar, click the Sort button. Double-click the ITEM_DESC column to move it to the Sorted Columns box. Then, click the ITEM_DESC column to activate the Sorting Order section. Click Ascending.
3.
Click OK.
Report Creation
7-9
Next, you select the columns that you want in your query. You select a column by double-clicking it.
1.
Select the following columns:

I
ITEM_DESC sum_units sum_sales
The UNITS and SALES columns are checked automatically when you check the sum_units and sum_sales columns:
7-10
2.
You can inspect the SELECT statement that will be generated by clicking the Show SQL button on the Query Builder toolbar.
3.
Click Close to close the Show SQL dialog box. Then, exit the Query Builder by clicking OK.
At this point, the Report Wizard displays the SQL Query Statement box, which shows the SELECT statement that you just generated.
Finishing the Report

To finish the report, take these steps:
1.
Click Next to continue.
Report Creation 7-11
2.
In this step, you choose the columns that you want to display in the report. Select all of the columns by double-clicking each column name to move it to the Displayed Fields box.
3. 4.
Click Next. Now you can choose the columns for which you want to calculate totals. However, if you select any columns in this dialog box, the calculation will be performed by the Reports engine. First, the Reports engine will retrieve all of the required data, and only then will it perform the calculation. For large amounts of data, it is far more efficient to perform the calculation on the server. For this reason, you defined the summary columns in the Query Builder in the previous exercise. The calculation is performed as part of the SELECT statement, so that the result set returned by the query already contains the calculated data. Skip this step by clicking Next.
5.
The next screen gives you the opportunity to modify the column labels and the column widths. Make the following changes for the ITEM_DESC column:
a. b.
Position your cursor in the Labels column and change the ITEM_DESC label to Item Description. Position your cursor in the Width column and change the width from 10 to 35.
6.
Click Next.
7-12
7.
Click Predefined Templates and choose Confidential Heading Landscape.
8.
Click Next. Then, click Finish to finish the report.
The wizard produces a report that it displays in the Live Previewer:
Before you proceed, save your work by choosing Save from the File menu of Report Builder. Save the report as bwsell.rdf.
Formatting the Report in the Live Previewer

The Live Previewer is a WYSIWYG editor that lets you modify and format the report while viewing the report output. You can format the report directly in the Live Previewer in the same way that you format a document in a word processing program. For example, you can bold or italicize text, and you can format numbers to represent dollar figures. To modify the report, take these steps:
1. 2.
To make the title larger, click the title and then, from the font size drop-down list in the toolbar, select 18. To give more emphasis to the column headings, click each column heading and click the Bold button.
7-14
3.
To format the Sum Sales column, select the Sum Sales column and take these steps:
a. b. c.
From the toolbar, click the Currency button. From the toolbar, click the Add decimal place button twice. From the toolbar, click the End justify button.
4.
If the document is not company confidential, select the red box containing the words Company ConfidentialInternal Distribution Only and click the Clear button on the toolbar. Save the report again.
5.
Here is the report again, after the formatting has been applied to it in the Live Previewer:
Making Changes to the Report

Take a good look at the report that you produced. The report displays the data that you need to determine how the products are selling, but it still is not easy to tell at a glance which are the best-selling products and which are the worst-selling products. In retrospect, you should have sorted the report by sum_sales and not by ITEM_ DESC. To make this change, you use the Report Wizard. For these steps to work, the report must be open and selected in the Object Navigator. If it is not, you can open the report definition file by choosing Open from the File menu.
1. 2.
From the Tools menu, select Report Wizard. Select the Data tab, which shows the SELECT statement on which the report is based.
3. 4.
Click Query Builder and click the Sort button on the toolbar. From the Sorted Columns box, select ITEM_DESC and click Remove.
7-16
5.
From the Available Columns box, select sum_sales and click Copy. Then, click the sum_sales column to activate the Sorting Order section. Click Ascending.
6. 7.
Click OK and then click OK again to close the Query Builder. Click Finish to see the changed report in the Live Previewer.
Deploying a Report
At this point, you can disseminate your report. If you want to publish it on the Web, Oracle Reports provides a Web Wizard that helps you prepare your report for the Web. To experiment with this feature, click on the Web Wizard button in the Live Previewer toolbar.
Deploying a Report
It is not necessary to invoke Report Builder just to run a report. For a client/server installation, all that you need is the report definition file (with the .rdf extension), Reports Runtime, and SQL*Net access to the database. For a multitier configuration, you must install the Reports Server on an application server machine. See the Oracle Reports Server Installation and Configuration Guide for details.

In this chapter, you learned how to use Oracle Reports to create reports for your end users. As the end users become more familiar with the first implementation of the data mart, they will make new requests for expansion and enhancements. You need to develop a strategy to handle increased demand and manage the ongoing maintenance requirements of the data mart. Chapter 8 discusses managing the data mart. In a short time, the data from the internal systems may not be adequate to meet the user needs. Chapter 9 touches on the value of external data in the data mart and some of the issues in integration of this data.
7-18
8
Manage
At this point, you have built the data mart and learned how to set up and use end-user tools to access the data mart. This chapter discusses how to manage the data mart.
In this chapter, you use Oracle Enterprise Manager to create user accounts and to perform other maintenance operations. You use Oracle Data Mart Builder Admin to manage the repository and Oracle Data Mart Builder to refresh the data in the data mart. You use the scheduling facility of Oracle Data Mart Builder to schedule refreshes of the data to reflect changes in the source database.
Manage
8-1
Data Mart Maintenance Issues
Data Mart Maintenance Issues

Your data mart is populated and running and meeting the performance expectations of your users. However, over time, performance characteristics may degrade as a result of additional users and more data. You need to have a strategy to maintain a consistent level of performance and availability. You also need a process to maintain currency and accuracy of the data, ensuring that changes to the source data are reflected in the data mart on a regular basis. The activities in the maintenance of the data mart include:
I
Managing users and ensuring the security of the data mart Managing database backup and recovery Managing performance Managing growth Creating a summary table Managing the ETT environment, including managing the repository and defining a schedule for refreshing the data Refreshing (updating) data
The next sections discuss how you perform these activities.
Managing Users and Security

Your security policies should enable you to specify which users can access which data, to set resource usage limits for each user, and to easily add or drop a users access to data. In contrast to most transaction processing applications, the data in the data mart may require varying degrees of security. For example, you might extract data from a payroll application, and although you may store only summarized data in the data mart, the information could still be sensitive. Therefore, you may need to control access to data at the column or table level. You can manage security at these levels of granularity using the facilities built into the Oracle8 database server. The Oracle8 database server manages database security through users and schemas. Every Oracle8 database has a valid list of users. To access the database, each user must connect through a database application providing a valid user name and password.
8-2
Managing Users and Security
The access rights of a user are controlled by the users security profile. The profile includes:
I
Whether user authentication information is maintained by the database or the operating system A list of tablespaces accessible to the user and the associated quota (the amount of space the user can use in a tablespace) The users resource profile (limits that dictate the amount of system resources available to the user) The privileges and roles needed to provide the user with the right level of access to data mart objects and operations
Let us look at each of these aspects of managing database security.
User Authentication
The Oracle8 database server provides a facility to centralize password maintenance in the operating system to allow users to connect to the database without specifying a user name and password. Alternatively, Oracle8 can authenticate users by using information stored in the database. When Oracle8 uses database authentication, it checks for a valid database user ID and its associated password before it establishes a connection. Oracle8 stores and maintains the passwords in the data dictionary in an encrypted format.
User Quotas and Space Requirements

User quotas limit the resources that are available to a user. The users default tablespace is where objects are placed when a user creates objects such as tables. You set a users default tablespace when you create the user ID. Each user also needs a workspace on the disk where temporary segments required for performing large sorts can be placed. This is the users temporary tablespace. You should specify the temporary tablespace when you create the user ID. You can change these options at any time.
Resource Profiles
You can limit the amount of system resources available to the user. By explicitly specifying the resource limits for each user, you can prevent excessive consumption by one or more users to the detriment of others. You manage resource limits with user profiles. A profile is a set of resource limits identified by a specific name. You
Manage
8-3
Global Computing Case StudyAdding Users and Roles
can create profiles for each class of user on your system and assign a profile when you create a user ID.
Roles and Privileges

A database privilege is the right to perform a specific action, such as executing a particular type of SQL statement, accessing another users objects, creating a table, or selecting rows from another users table. You can explicitly grant privileges to users so that they can accomplish tasks required for their jobs. You should grant only the privileges required. For example, most end users need the right to access all data mart tables, but do not need the right to drop these tables. You can create a role, which is a group of privileges identified by a specific name, and grant the role to all users who need that particular set of privileges. For example, you can grant the privileges to select rows from the table SALES to the role MARKETING_MANAGER. You can grant any user this role and grant this user the right to, in turn, grant the role to other users.

You use Oracle Security Manager to manage users, roles, and profiles. From the Oracle Enterprise Manager menu, select Security Manager and log in as user marty, password marty, and service dmdb. Refer to Logging In to Oracle Enterprise Manager Components on page 4-28 if you need more information. When you log in successfully, the Oracle Security Manager window appears.
8-4
The Users, Roles, and Profiles folders are displayed in a tree view in the window on the left.
Creating a New User ID

To create a new user ID, take the following steps:
1. 2. 3.
From the User menu, choose Create. In the Create User dialog box, for Name, enter the name of the user ID to be created. The name can contain up to 30 characters. Enter newuser. For Profile, select the profile that you want to assign to the user. Select DEFAULT. Security Manager assigns the DEFAULT profile if you do not select a profile. Authentication can occur externally, that is, through the operating system, or internally, through Oracle8. To require the user to enter a password, select Password from the Authentication drop-down list. Enter the password newuser and confirm the password. Specify the tablespaces for the new user, using the following:
4.
5.
Manage
8-5
For Default tablespace, select USERDATA. For Temporary tablespace, select TEMP.
6. 7. 8.
Select the Roles/Privileges tab. In the Privilege Type drop-down list, select Roles. You want to assign the CONNECT and RESOURCE roles to the new user. From the Available box, select the role RESOURCE and click the down arrow. By default, the role CONNECT is listed in the Granted box. To provide the user with the ability to grant the privileges the user has just received to other users, check Admin Option for each role.
9.
8-6
10. To view the SQL that is automatically generated by the Security Manager, click
Show SQL.
11. From the Privilege Type drop-down list, select System Privileges. The available
system privileges are displayed in the Available box.

12. Select the following system privileges, then click the down arrow:
I
CREATE TABLE CREATE VIEW SELECT ANY TABLE
Manage
8-7
When you grant the RESOURCE role, the privilege UNLIMITED TABLESPACE is granted.
13. Select the Object Privileges tab.
A tree view of all objects is displayed in the Objects scrollable list box.
14. Click the plus sign (+) to the left of the MARTY object to expand the schema
and then expand Tables.

15. To associate privileges with certain objects for the new user, select the object and
then select from the privileges that are displayed in the Available Privileges list. To assign ALTER privileges for the SALES table, select SALES, select ALTER, and click the down arrow.
8-8
16. Repeat the previous step to assign the DELETE, INDEX, INSERT, SELECT, and
UPDATE privileges on the SALES object to the new user as shown in the following figure:
17. Click Create to create the new user ID with all selected privileges.
You can also assign privileges by dragging Object and System Privileges over the user name using the tree view on the opening screen of Security Manager.
Creating a New Role

To create a new role, take the following steps:
1.
From the Role menu, choose Create or select the Roles folder and click the large plus sign (+) located on the menu bar.
Manage
8-9
2.
From the General tab of the Create Role dialog box, enter the name of the new role, NEWROLE.
3. 4.
For Authentication, select None. Select the Roles/Privileges tab. From the Privilege Type drop-down list, select Roles.
8-10
Managing Database Backup and Recovery
5.
From the Available box, select the role CONNECT, then click the down arrow. Select the role RESOURCE, then click the down arrow.
6.
Click Create to create the new role.

As the data mart becomes an important business resource, availability becomes increasingly important. You need policies to make sure that the data is backed up and can be quickly recovered in the event of system failure. This section provides some background on the types of failure and then gives an overview of the available backup options. Because a complete discussion or demonstration of database backup and recovery procedures is outside the scope of the cookbook, you should read the Oracle8 Administrators Guide for details.
Types of Database Failure

The most common types of database failure are instance failure and media (disk) failure.
Instance Failure
Database instance failure occurs when a problem arises that prevents the Oracle database instance (the background processes and memory structures) from
Manage 8-11
continuing to work. For example, a power outage or a software problem, such as an operating system crash, may cause this failure. After the cause of instance failure is resolved and you restart the database, the Oracle8 database server automatically recovers all data without any further intervention on your part.
Media Failure
Media failure is a physical problem in reading or writing physical files needed for normal database operation. A common cause of media failure is a disk head crash resulting in the loss of all files on a disk drive. Typically, you need to intervene to recover the database if this type of failure occurs. The recovery procedure you use depends on the type of file affected by media failure.
Structures Used for Database Recovery

The following sections describe the different structures that are important for database recovery.
Database Backup Files

A backup consists of a set of operating system copies of the physical files that constitute an Oracle database. The files in the backup can include datafiles, redo logs, and control files.
Redo Logs
At least two redo log groups are present for each Oracle database. These are separate from the Oracle datafiles. Datafiles store the actual data, whereas redo logs are filled with redo entries. Redo entries record data that can be used to reconstruct all changes made to the database. These redo entries are representations of database changes in a special format. There are two categories of redo logs:
I
Online redo logs: Every Oracle database has an online redo log. The online redo log, which is maintained by the Oracle background process LGWR, records all changes made in the database. The online redo log consists of two or more preallocated files that can be reused in a circular fashion. In the sample database installed with Oracle Data Mart Suite, the redo log files are logdmdb1.ora, logdmdb2.ora, logdmdb3.ora, and logdmdb4.ora. Archived redo logs: You can configure a database to archive files of the online redo log when the log files are full. Consider the sample database. When logdmdb1.ora fills up, Oracle8 writes redo entries to logdmdb2.ora. When logdmdb2.ora in turn fills up, Oracle8 writes redo entries to logdmdb3.ora, then to logdmdb4.ora. When logdmdb4.ora fills up, Oracle8 reuses (starts
8-12
overwriting the entries) logdmdb1.ora. If the database is configured to archive redo log files, an operating system copy is made of each redo log before it is reused. By archiving filled redo log files, older redo log information is preserved for more extensive database recovery operations, while the preallocated online redo logs continue to be used to store the most current changes.
Control File
The control file stores the status of the physical structure of the database. Certain status information in the control file, such as the names of datafiles and the name of the online log that has the most current changes, is needed by Oracle8 to perform database recovery.
Backing Up a Database
You can back up a database using different modes and different types of backup.
Database ModesARCHIVELOG and NOARCHIVELOG

A database can operate in two distinct modes: NOARCHIVELOG mode, where the archiving of the redo log is disabled, and ARCHIVELOG mode, where the redo log is archived before being reused. NOARCHIVELOG mode only protects the database from instance failure, that is, situations such as a power outage that prevents the Oracle database instance from functioning. ARCHIVELOG mode permits complete recovery from disk failure as well as instance failure, because all changes made to the database are permanently saved in the archived redo log.
Types of Database Backups

I
Full and partial backups: A full backup is an operating system backup of all datafiles and the control file in the database. A partial backup is any level short of a full backup, such as a backup of all files belonging to a single tablespace. Offline backups: An offline backup is a copy of all physical datafiles taken when the database is shut down and all datafiles are closed. This term is also used to refer to backups of tablespace files taken when the database is open but the tablespace is offline. The operating system provides a backup utility for offline backups. In an NT environment, for example, you use the NT Backup Manager to perform the offline backups. Online backups: If a database is operating in ARCHIVELOG mode, you can perform online backups. You make a physical copy of any datafile when the
Manage 8-13
database is still open, and the associated tablespaces and the specific datafiles are online and currently in normal use. To do this, you need to issue specific commands to let Oracle8 know that the datafile is being placed in backup mode, take an operating system copy, and then issue a command to mark the end of the online backup of the datafile. For further details on the procedure for online backups, refer to the Oracle8 Administrators Guide.
I
Control file backups: A control file backup is a form of a partial backup. Because a control file keeps track of the physical structure of the database, you should back up the control file each time you make a structural change to the database. In general, it is a good idea to include control file backups in your normal backup schedule.
The available backup options depend on whether you are using ARCHIVELOG mode or NOARCHIVELOG mode. If most of the data is refreshed on a regular basis, you can choose to use NOARCHIVELOG mode for your data mart. If your database is in NOARCHIVELOG mode, you should regularly perform full, offline backups of the database. If your database is in ARCHIVELOG mode, you can use a combination of full offline backups and partial online backups. Because you can choose to back up only a few files at a time during each partial backup, a complete backup set can be composed of individual datafile backups taken at different points in time. These will not be consistent with each other, but if you store all archived redo logs since the earliest partial backup represented in the backup set, Oracle8 can make the datafiles consistent at recovery time.
Recovering a Database
Oracle8 automatically performs instance recovery whenever the database starts. Recovery from media failure, however, requires manual intervention. Recovery from media failure can take two forms, depending on the archiving mode of the database.
I
If the database uses NOARCHIVELOG mode, recovery from a media failure is a simple restoration of the most current full backup. All work performed after the full backup must be redone manually, if desired, after the damaged database is restored. If the database uses ARCHIVELOG mode so that its online redo log is always archived, recovery from a media failure involves reconstructing the database to a specific state before the media failure occurred. The recovery procedure that you use depends on the type of file that was affected by the media failure. For further details, refer to the Oracle8 Administrators Guide.
8-14
Managing Data Mart Performance
Backing Up and Recovering Read-Only Tablespaces

If you have historical data that is not likely to be updated or refreshed, you can make the tablespaces containing this data read-only. You need to back up a read-only tablespace only once, after making it read-only. As long as the tablespace remains read-only, you do not need to perform any further backups on the tablespace. If you change a read-only tablespace to be read/write, you must resume your normal backups of the tablespace. No recovery is ever needed on read-only tablespaces after instance recovery. If you need to perform media recovery on a tablespace that is currently read-only, you only need to restore the last backup of the tablespace, and you do not need to apply any redo information.

Often, the data mart is built rapidly without the luxury of the numerous levels of analysis and planning you find in a conventional system life cycle. As a result, you may not have done a great deal of performance analysis and capacity planning before building the data mart. Even if you had, it is often difficult to predict workloads until the system is built and in use. If the data mart is a success, the number of users accessing the system is likely to increase beyond the point anticipated when it was initially built. As these issues imply, managing the performance of the data mart is an active, ongoing process and not always a precise, scientific one. Therefore, the following sections offer guidelines rather than take you through a step-by-step process. Because a detailed discussion of the general principles is outside the scope of the cookbook, the sections highlight some issues and refer you to Oracle8 Tuning for details. In its simplest terms, acceptable performance for a data mart means that the data mart is able to meet all business requirements with acceptable response times. The remainder of this section touches on several aspects of tuning your data mart for acceptable and, more preferably, optimal performance. Those aspects are:
I
Capacity planning Physical database configuration and layout Initialization parameters for a data mart
Manage 8-15
Capacity Planning
When you configure a data mart, you should ensure that all components, such as CPU, memory, and I/O, are effectively used. Generally, memory usage and the I/O subsystem should be sufficient to keep the CPU busy.
How Many CPUs Do You Need?

The CPU sizing for a data mart is a function of the database size and the user workload. The database size affects the number of CPUs you need for a variety of reasons. As the data volume increases, more administrative operations, such as loading data, rebuilding indexes, and backup and restore in parallel, are executed to complete within finite batch windows. Parallel operations are CPU-intensive. For parallelism of operations to produce performance improvements, you should run your data mart on a server with multiple CPUs. You can add CPU resources to a system as the workload increases, but when adding CPUs, it is important to understand that it is likely that you will not achieve linear scalability of performance on all operations.
Memory Requirements
Memory is used for a number of distinct tasks by any database application. If memory resources are insufficient for any of these tasks, the bottleneck causes the CPUs to work at lower efficiency and system performance to drop. Data mart applications tend to use memory heavily. Furthermore, memory usage increases in proportion to the number of users concurrently accessing the data mart, the use of parallelism, and the number of sorts and hash join operations. The memory available for queries in a data mart environment comes from process memory. Process memory is the memory used by the actual processes on the machine. If you are executing operations in parallel using several query server processes, each of these might use a significant amount of memory for sorts, hash joins, and other memory-intensive operations. The following table lists the memory usage per user for some typical operations in a data mart:
Memory Utilization Low (100 KB to 1 MB) Type of Operation Table scans, index lookups, index nested loop joins, sum or average operations with no GROUP BY clause or very few groups Large sorts, sort merge joins, index creation, GROUP BY or ORDER BY clause returning a large number of rows, hash joins
High (1 MB to 10 MB)
8-16
The memory available to processes comes from the virtual memory on the system, which is somewhat more than available physical memory. If the sum of all active memory usage exceeds the available physical memory on the system, the operating system may need to store some of the memory pages on disk. This is called paging. Paging can degrade performance if memory is too oversubscribed. Generally, you should not exceed 20% oversubscription of physical memory. If paging occurs, you need either to scale back memory usage by processes or to add more physical memory. Keep in mind the trade-offs: there are physical limits to the amount of memory you can add, but scaling back on per-process memory usage can significantly degrade performance.
Requirements for the I/O Subsystem

The I/O subsystem can be compared to a pump that pumps data to the CPUs to enable them to execute workloads. The I/O subsystem is also responsible for data storage. The main components of an I/O subsystem are arrays of disk drives controlled by disk controllers. It is important to consider performance requirements when you size the I/O subsystem, rather than sizing based only on storage requirements. This is because while disk drives have increased in size, the throughput, the rate at which the disk drive pumps data, has not increased in proportion. In sizing calculations, you use the following factors as input:
I
The size of the database The number of CPUs on the system An initial estimation of the workload on the data mart server The rate at which the disk can pump data Space needed to stage data prior to load Space needed for index creation and sort activities
Tuning Physical Database Configuration

Once you have determined how much disk space to configure, you need to lay out the database so that you can realize the full potential of the I/O subsystem. If you intend to use parallelism, you should spread out all datafiles accessed in parallel across multiple disk drives and controllers, using at least as many disks as the degree of parallelism. The process of spreading database files across multiple disk drives is called striping. You can perform striping at the level of the operating system or the disk hardware. You can also stripe manually, by using SQL*Loader to load data from multiple loader sessions into different files of the tablespace.
Manage 8-17
Stripe across at least as many disks as there are CPUs in the system. When a file is striped across disks, chunks of the file in units called stripe size are allocated to each file in round-robin fashion. Appropriate stripe sizes for table data are 128 KB, 256 KB, 1 MB, and 5 MB, depending on the database block size and the size of the table. For a typical data mart where the database size is perhaps 20 GB, appropriate stripe sizes are 1 MB and under. Striping affects media recovery. If you lose a disk, you lose access to all objects stored on that disk. If an object is striped over multiple disks, the loss of any disk tends to affect more objects than when files are not striped. If you use striping, you should plan your backup and disk mirroring strategies with this factor in mind.
Setting Initialization Parameters

Many initialization parameters affect the performance of data mart applications. For best results, start with an initialization file that is appropriate for your workload. The next sections describe some initialization parameters that are important for data mart applications. For further details on these parameters and how they affect query performance, read Oracle8 Tuning.
Parameters Affecting Resource Consumption

This list shows some of the parameters that affect system resource usage in a data mart environment. Any recommended numerical values assume that your database size is 20 GB or under and the configuration of your system is around 4 CPUs and 256 MB memory.
I
DB_BLOCK_BUFFERS determines the size of the database buffer cache and is specified in units of database block size. A reasonable starting value is 1000. SORT_AREA_SIZE specifies the amount of memory to allocate for sort operations. The cumulative sort area used by all server processes adds up fast because each server can allocate this amount of memory to perform a sort. If memory is abundant on your system, set the parameter SORT_AREA_SIZE to a large value. This benefits the performance of sort operations because sorts are more likely to complete in memory. However, if the amount of memory is a concern, reduce the SORT_AREA_SIZE so that paging does not result. A sample range of starting values is 256 KB to a few megabytes. HASH_AREA_SIZE indicates the memory used by each process for hash joins. Start with an initial value of a few megabytes and adjust the value according to your workload.
8-18
PARALLEL_MAX_SERVERS specifies the maximum number of query servers that can be started on the system. When setting this parameter, remember that most queries need at most twice the degree of parallelism attributed to any table in the query. PARALLEL_MIN_SERVERS specifies the number of query servers to be started and reserved for parallel query operations when the instance starts. SHARED_POOL_SIZE is used primarily for storing shared SQL statements and some internal structures. However, when you use parallelism, the shared pool also provides space for a pool of message buffers that the query servers use to communicate with each other. A reasonable starting value is about 20 MB. Increase the value if you significantly increase the number of query servers that are active on your system. If you are not using parallel query in your data mart application, ignore this parameter.
Parameters Enabling New Features

Set the following parameters to use the latest available functionality:
I
ALWAYS_ANTI_JOIN determines how the NOT IN operator is evaluated. The recommended value is HASH, which enables the most efficient execution path. HASH_JOIN_ENABLED enables hash joins when you set it to TRUE. STAR_TRANSFORMATION_ENABLED enables star transformation when you set the value to TRUE. Use the STAR_TRANSFORMATION hint to make the optimizer use the best plan in which the transformation has been used.
Parameters Related to I/O

The following parameters influence the input and output on your system:
I
DB_BLOCK_SIZE is the default block size. The value used for the DMDB database is 8 KB, which should be appropriate for most data mart applications. DB_FILE_MULTIBLOCK_READ_COUNT determines how many database blocks are read with a single operating system call. In general, use the formula 64KB / DB_BLOCK_SIZE. For example, specify 8 if the DB_BLOCK_SIZE is 8 KB. HASH_MULTIBLOCK_IO_COUNT controls the number of blocks a hash join operation should read and write concurrently. The suggested value is 4. SORT_DIRECT_WRITES specifies how sorts execute. When you set it to AUTO, sorts execute very efficiently.
Manage 8-19
Managing Growth
Managing Growth
Because data marts often store historical data, the size of the database can grow over time. The administrative operations of loading, purging, and indexing data also become more resource-intensive. You need to manage growth by adding space to the database and adding objects like rollback segments. You do this using Oracle Enterprise Manager.
Global Computing Case StudyAdding a Datafile to an Existing Tablespace

One way to add space to the database is by adding a datafile. To add a datafile to an existing tablespace, choose Storage Manager from the Oracle Enterprise Manager menu. Log in as user system, password manager, and service dmdb. To create a new datafile, do the following:
1.
Select the Datafiles object folder. This displays all datafiles in the database, as shown in the following figure:
2. 3.
Choose Create from the Datafile menu to display the Create Datafile dialog box. Enter the name of the datafile, specifying the complete path. The path you specify should be similar to the example shown:
d:\orant\database\marty_dmdb2.ora
8-20
Global Computing Case StudyCreating a Summary Table
4.
Using the drop-down list, select the tablespace, USERDATA, with which to associate this datafile.
5. 6. 7. 8. 9.
Select Online to indicate that this file should be available for user access. For File Size, enter 150 in the text box. Specify the unit as megabytes by clicking M. Select the Auto Extend tab and specify 10M as the Increment. For Maximum Extent, click Unlimited. Click Create to create the new datafile.

Decision-support applications often require that large amounts of data be summarized or rolled up into smaller tables for use with ad hoc queries. This allows you to compute data once and reuse it many times. After you have populated the database, you can create a summary table. In this exercise, you create a summary table that stores the total sales for each product for each of the years 1993 to 1996. This table is the result of a query against the table SALES. If your users issue such a query often, consider creating a summary table that represents the result set so that the data mart does not need to compute the result set again and again. The fastest way to create this summary table is to use the Parallel CREATE TABLE . . . AS SELECT (PCTAS) feature, described in Parallel CREATE TABLE. . . AS SELECT Statements on page 4-21. You create the table in parallel using the defining query while recoverability is turned off.
Manage 8-21
To create a summary table, you use Schema Manager. From the Oracle Enterprise Manager menu, invoke Schema Manager. Log in as user system, password manager, and service dmdb. Then, take these steps:
1. 2. 3. 4. 5.
From the Object menu, select Create. From the Create Object window, select Table and click OK. Select Create Table Manually. In the Create Table dialog box, enter ANNUAL_SALES for Name and select MARTY from the Schema drop-down list. Select <default> from the Tablespace drop-down list. Click the Define Query radio button, then enter the following query in the query box:
SELECT SUM(SALES) ANNUAL_SALES, PRODUCT_ID, YEAR_NUMBER YEAR FROM MARTY.SALES, MARTY.DAYS WHERE SALES.DATE_ID = DAYS.DATE_ID GROUP BY PRODUCT_ID, YEAR_NUMBER
Now, the dialog box looks like this:
6.
Click Create to create the table.
8-22
Managing the ETT Environment
7. 8.
In the Object menu, expand Tables, expand MARTY, and select ANNUAL_ SALES. Select the Options tab. Then, select the Parallel check box. Select Value and specify 4 as the degree of parallelism. The degree of parallelism should be at least the number of CPUs on your machine and, at most, the number of disks across which the table is spread. Click Apply.
9.

You can manage the ETT environment using Oracle Data Mart Builder Admin. This section provides a brief overview of the components of the tool and then looks at the different aspects of ETT administration:
I
Managing the repository Backing up the repository Cleaning up the repository Managing Data Collection Agents Scheduling data updates (refreshes)
Manage 8-23
Using Oracle Data Mart Builder Admin

To start Oracle Data Mart Builder Admin, select it from the Oracle for Windows NT menu of the Windows NT Program menu. You see the following screen:
The major components of the screen are:

I
Title Bar: Displays the name of the application. Menu Bar: Lets you view online help or exit Oracle Data Mart Builder Admin. Configuration Tab: Displays repositories and agents with their associated items in the tree view. Tree Control: Displays item categories and existing named items. Double-click an item category, or click the plus sign (+) to expand or click the minus sign (-) to collapse the list. When you select an item in the tree control, its properties display in the property sheet area in the right side of the window. Property Sheet Area: Lets you view and set properties of repository items.
When you try to expand one of the items in the tree view, Builder Admin displays a login box. Log in as user system with password manager.
8-24
Managing the Repository

The repository is a relational database that you use to store information, such as BaseViews, MetaViews, and Plans, related to Oracle Data Mart Builder. As the data mart administrator, you create the required repository accounts that determine who can create and access BaseViews, MetaViews, and Plans using Oracle Data Mart Builder. Oracle Corporation does not recommend using database tools to directly alter a repository database because unforeseen consequences, like items disappearing, could result. As the data mart administrator, you should periodically run the Clean Up Repository utility to remove the repository objects deleted by users. You can configure it to run the cleanup on a schedule. In addition, you should back up each data mart repository that you manage.
Registering a Repository
When you install Oracle Data Mart Builder, it automatically builds one repository, which is always named <default>. When you start Oracle Data Mart Builder Admin, the <default> repository is already registered. You may want to register a new repository. For example, to access a repository on another system, you must add an entry to the Oracle TNSNAMES.ORA file and register the repository. To register a repository, take the following steps:
1. 2. 3.
From the tree view, select the Repositories tab, select the Manage Repositories tab and click Register New. The Register Repository dialog box appears. For the Name, enter New_Repository. For User Name, enter dmadmin.
Manage 8-25
4. 5. 6.
For Password, enter manager. For Service Name, enter dmdb. Click Register.
The new repository is displayed, as shown in the following figure:
Backing Up the Repository

As the Oracle Data Mart Builder administrator, you should create a repository backup scheme for each data mart you manage. How frequently you back up the repository depends on the amount of data it contains, how often the data changes, and how difficult it would be to restore. In general, you should back up the repository at least once a week. One way to back up the repository is to export it to a file. Oracle Data Manager, which is a component of Oracle Enterprise Manager, lets you export a database, a schema, or individual tables. For example, to export the Data Mart Builder repository used in this book, you choose the user dmadmin from the Oracle Data Manager tree and then select the tables beginning with SARP_. To export the Oracle Designer repository as well as the Data Mart Builder repository, select all of the tables owned by dmadmin.
8-26
Cleaning Up the Repository

When a user deletes any Oracle Data Mart Builder item, such as a Plan, the deleted item disappears from the users Workspace, Staging Area, or Bin. Although the user can no longer view the item, it remains in the repository with a programmatic bit set to mark it for removal. When you run the Clean Up Repository utility, it searches the repository for marked items and deletes them. Only marked items that are saved to disk are deleted. Cleaning up reduces the amount of database space the repository occupies and increases performance for end users. As the data mart administrator, you should periodically run the Clean Up Repository utility to remove the repository objects deleted by users. To schedule the repository cleanup:
1. 2.
Select the Repositories tab, then select the Clean Up Repository tab. Click Add New. The Schedule Repository Clean Up window is displayed:
3. 4.
Select the Time at which the repository cleanup should start. In the Weekly section, select On Every, and then select one of the seven days.
Manage 8-27
Updating Data in the Data Mart
Alternatively, you can schedule a monthly cleanup. To do so, select one of the radio buttons in the Monthly section and select a date from the drop-down list.
5. 6. 7. 8.
In the Remove Deleted section, select all options. Select the check box for Stop if object not deleted. Enter the name of the Log File and click OK to schedule the cleanup. If you are not logged in to the repository, Builder Admin prompts you for a password. Enter manager.
Builder displays the job in the Scheduled Clean-Up Jobs list. Now, you can exit from Oracle Data Mart Builder Admin.
Managing Data Collection Agents

Oracle Data Mart Builder is an agent-based application. An agent is a special program designed to economically perform a particular type of work. Agents typically save time. They use fewer network resources than individually packaged requests because the entire program is executed locally. Agents associated with Oracle Data Mart Builder run as Windows NT services, which simplifies management and configuration tasks. In Oracle Data Mart Builder, you can easily register and configure new agents. You can configure an agent to start automatically at boot time, or you can manually start or stop agents from Oracle Data Mart Builder. Oracle Data Mart Builder retrieves data through the Data Collection Agent, which brokers all interaction in database servers and client applications in the data mart. The Data Collection Agent packages a request, dynamically constructs the correct type of SQL code, locates the source database on the network, submits the request, and returns results to the client application. If the Data Collection Agent is not started on the agent machine, Oracle Data Mart Builder cannot run. For more information about the Data Collection Agent, see the Oracle Data Mart Builder documentation.

To keep the data up-to-date, you need to determine how you will refresh the data in the data mart to reflect changes in the source data. You can update the fact table or the dimension tables.
8-28
Updating Data in the Fact Table

Fact tables usually contain transactional data with an associated time stamp (from the Time dimension table) for each row. For example, in the SALES fact table, the DATE_ID column is populated indirectly from the DATE_DESC column of the DAYS dimension table. Other keys are populated from the other dimension tables. The columns UNITS, SALES, COST, and MARGIN are populated directly from the source data. Usually, you want to get the most recent set of data from the fact table and append it to the bottom of the existing fact table. You do not want to alter the existing rows, because you want to preserve that data. For example, the SALES fact table contains three years of data. You may want to append new data from the source to the SALES fact table monthly. In addition, you may want to remove the data for the oldest month to maintain exactly three years of data in the fact table. To update the fact table, you must isolate the newest set of data from the source data by querying the source data based on the time stamp of the transaction, in this case, the ORDERED_DATE column of the SAMPLEOLTP.ORDER_HEADER table. You filter the data by comparing the time stamps and add only data newer than the last update to the table. The simplest way to start is to make a copy of the plan that loads the SALES fact table:
1. 2. 3. 4.
Start Data Mart Builder and log in using User system, Password manager, and Connect as dmdb. Click the Plan Bin icon. Select the plan SALES_PLAN and, from the File menu, select Open Plan SALES PLAN. From the File menu, select Save Copy of Plan SALES_PLAN.
Then, you modify the plan:

1. 2. 3. 4.
Select Copy of SALES_PLAN and, from the File menu, select Open Plan Copy of SALES_PLAN. Right-click the SQL Query Transform and select SQL Editor. Add sampleoltp.order_header to the list of tables in the FROM clause. Add the following to the WHERE clause:
AND "SAMPLEOLTP"."ORDER_HEADER".ORDERED_DATE > 01-JUN-1998
Manage 8-29
5.
Save the plan by right-clicking in the Data Flow Editor, selecting Rename plan and typing a new name, and then selecting Save Plan from the File menu.
You can run the plan immediately or schedule it to run on a regular basis, as explained in Scheduling Data Refreshes on page 8-39.
Updating Dimension Tables

Data in dimension tables needs to be updated (refreshed) regularly as the source data changes. You need to add new rows and make changes to existing rows. In addition, you may want to filter the data, transforming it so that it meets the criteria of the data mart schema. Updating dimension tables requires careful planning and vigilance, particularly if you use warehouse (synthetic) keys. This section describes one method of updating dimension tables that uses the strengths of Oracle8 database server and Data Mart Builder and minimizes the impact on the production database. In real life, companies sometimes have a customer table containing tens of thousands of customers. It is unlikely that all of the customer data is modified between two updates to the data mart. Usually, only ten percent or less of the customer information changes, either adding new customers or modifying the information about existing customers. The exercise in this section uses these assumptions in updating the customer dimension. The exercise uses a flat file that contains the updated (new and modified) data. Because each production system is different, there are many correct ways to extract the updated data from the production database. One method is to include a Date Modified column in the CUSTOMER table in the production database. This exercise assumes that, for existing customers, only the column SHIPTO.SH_ DESCRIPTION from the production system (SHIP_TO_DESC in the dimension table) contains modifications. Some customers offices are now in different cities. You could use a similar plan to update more than one column. This scenario illustrates the simplest way to refresh the CUSTOMER dimension table. A data file contains a week of updated data. This file contains two types of records:
I
Records that describe new customers added to the SHIPTO table during the previous week Records that record any modifications to the data about existing customers
The records identify the customers by the natural keys ACCOUNT_ID and SHIPTO_NUM.
8-30
The records in the data file do not indicate whether a record is new or updated. Data Mart Builder determines this by looking up the ACCOUNT_ID and SHIPTO_ NUM for each record in the CUSTOMER table. If the data does not exist, the customer is a new customer. The data flow plan accomplishes the following tasks:
1. 2. 3.
Reads the records from the data file containing the new and updated data. Determines whether the record refers to an existing customer or a new customer. When a record refers to an existing customer, adds the corresponding warehouse key to the record. When the record refers to a new customer, generates a new warehouse key and adds it to the record. In one path in the data flow plan, it loads the new customers into the CUSTOMERS dimension table. In the other path, it updates the modified customer records in the CUSTOMERS dimension table.
4.
In the data flow plan shown, the upper branch loads new customers and the lower branch updates existing customers whose data has been modified.
Because this plan performs many different tasks, use a Grid Transform as a sink for each stage of the plan. Doing so allows you to verify the results at each stage. You create the data flow plan by taking the following steps:
1. 2. 3.
From the Data Mart Builder File menu, select New Plan. From the Tool Bin, select the Delimited Text File Source Transform and drag it into the Data Flow Editor. From the Tool Bin, select the Grid Transform and drag it into the Data Flow Editor.
Manage 8-31
4.
Double-click the Delimited Text File Source Transform and enter the following information:
I
For File Name on Server, enter <oracle_home>\datamart\customer.csv. For Character Set, select ANSII. For Separator Character, select the comma (,). For Quote Character, select the double quotation mark (). For Comment Character, select the number sign (#). Because the first line of the data file contains the column names, check the box Column Names in File. To have the transform automatically define the column names and data types, type the following file specification in the Sample Data File Name box:
<oracle_home>\datamart\customer.csv
Because the data file is small, you can use it as a sample file. If your data file is large, create a small subset of it to use as a sample file.
8-32
Then, in the Auto-Populate Field List box, click Column Names and Data Types. Builder adds information about the columns to the Field List.
5. 6.
Click OK. Drag the Key Lookup Transform into the Data Flow Editor and place it to the right of the Delimited Text File Source Transform. Double-click the transform and enter the following information:
I
For BaseView, select GCC_STAR. For Table, select MARTY.CUSTOMERS. If your system can tolerate the memory usage, check the box for Cache Whole Lookup Table.
7. 8.
In the Generated Keys section, click New. In the Generated Key dialog box, enter the following information:
I
For Lookup Column, select CUSTOMER_ID.
Manage 8-33
For Place Result In, click the New Field button to create a new field for the warehouse key. For Result Field, enter CUSTOMER_ID.
Click OK.
9.
In the Natural Keys section, specify the two natural key columns. First, for Input Table Column, select ACCOUNT_ID. For Lookup Table column, select ACCOUNT_ID. Click ADD. Then, for Input Table Column, select SHIPTO_NUM. For Lookup Table column, select SHIPTO_NUM. Click ADD.
When the Key Lookup Transform finds the natural key values, it writes the corresponding warehouse key to the CUSTOMER_ID column. Otherwise, it returns a zero, which represents a new customer. Click OK.
10. To divide the data flow into two branches, one for new customers and one for
existing customers, drag the Conditional Splitter Transform from the Tool Bin into the Data Flow Editor, to the right of the Key Lookup Transform.
8-34
11. Double-click the transform to open the dialog box. Enter the following values:
I
For Column, select CUSTOMER_ID. For Operator, select is equal to. For Type, select number. For Value, enter 0.
If the value is 0, the customer is a new customer. Thus, the upper branch of the data flow plan is the new customer branch and the lower branch is the updated customer branch.
12. Click OK. 13. Drag the Key Generation Transform from the Tool Bin into the Data Flow
Editor. Place it to the right of the Conditional Splitter, on the upper branch of the plan. This transform generates a new warehouse key for new customers. Double-click the transform and enter the following values:
I
For Key Column, enter CUSTOMER_ID. Click Start Key Value From Next Available Value In. For BaseView, select GCC_STAR. For Table, select MARTY.CUSTOMERS. For Key Column, select CUSTOMER_ID.
14. Click OK. 15. To remove the duplicate key column, drag the Column Select Transform from
the Tool Bin to the right of the Key Generation Transform. Open the dialog box for the transform and deselect the first CUSTOMER_ID. (The second
Manage 8-35
CUSTOMER_ID column is the one generated by the Key Generation Transform.)
16. Click OK. 17. At this point, you can test the plan by clicking Update. However, even if you
are satisfied with the results, do not replace the Grid Transform yet! You need to verify the results of the lower branch first.
18. To verify the results of the lower branch, which loads the modified customers,
drag the Grid Transform into the Data Flow Editor, so that it connects to the lower branch of the Conditional Splitter. Test the plan by clicking Update.
19. If you are satisfied with the results shown in both Grids, delete the Grid
Transforms by right-clicking each Grid Transform and selecting Delete Step.

20. To load the new customers into the CUSTOMERS dimension table, drag the
SQL*Loader Transform into the Data Flow Editor to the right of the Column
8-36
Select Transform. Then, double-click the transform to open the dialog box. Enter the following values:
I
For BaseView, select GCC_STAR. For Table, select MARTY.CUSTOMERS. For Oracle Version, select Version 8. Because the table is already created, select Use Existing Table. For Update Method, select Append. For Load Options, select Use Conventional Path Loader because the new data is fairly small. For Batch Size, take the default. The SQL*Loader Transform uses the SQL*Loader utility default of 64. For Log File, enter a full file specification, such as:
c:\temp\update_cust.log
Click OK. Do not click Update yet. The upper branch, which loads the new customers, is complete. The data flow plan looks like the following:
21. To load the modified data into the CUSTOMER table, drag the Batch Update
Transform into the Data Flow Editor, so that it connects to the lower branch of the Conditional Splitter. The Batch Update Transform lets you update specific columns or all columns in a table.
22. Double-click the transform to open the dialog box. Enter the following values:
I
For BaseView, select GCC_STAR. For Table, select MARTY.CUSTOMERS. For Select Columns From, click Target.
Manage 8-37
From the Update Columns box, select CUSTOMER_ID and click the left arrow to move it into the Index columns box. This indicates that the table has a unique index on the column CUSTOMER_ID. Because you know that the only data that has been modified in the source data is the SHIP_TO_DESC column, select all other columns and click Remove to remove them from the list of columns to be updated.
For Log File, enter a file specification.
Click OK.
23. Now the data flow plan is complete. Select Update to load the data into the
CUSTOMERS dimension table.

24. Save the plan by right-clicking in the Data Flow Editor, selecting Rename plan
and typing a new name, and then selecting Save Plan from the File menu. The CUSTOMERS dimension table holds both the new data and the modified data. The scenario described in this section works well for typical data marts when the following conditions exist:
I
The modified and new data can be delivered periodically in a flat file. The number of total records represents a relatively small fraction of the total number of rows in the dimension.
8-38
Scheduling Data Refreshes

Oracle Data Mart Builder supports both time-based and event-based scheduling of data flow plans, which allows you to schedule a refresh of your data. Time-based scheduling uses the NT Scheduler, whereas event-based scheduling is programmatically controlled. Windows NT allows an external event to initiate a data flow plan. You can use an easy point-and-click interface to schedule jobs. This cookbook focuses on the Windows NT Scheduler.
Scheduling a Plan
You can schedule a plan to be executed on the server at a defined time. To schedule a plan, take these steps:
1.
In Builder, click the Clock icon. The Scheduler window is displayed, with a calendar showing the current month. The current date and time is selected by default. You can select the time in 10-minute increments. If any plans are scheduled for the date selected, the date is marked with a clock icon and the time that the plan is scheduled to run is marked with a plan icon and asterisk.
2. 3.
Select the date on which you want the plan to execute. From the Plan Bin, select the plan you want to schedule and drag it to the time that you want it to execute. An icon of the plan with an asterisk appears. At the
Manage 8-39
same time, the clock icon is displayed by the date, as shown in the following figure:
4.
From the Frequency drop-down list, you can select how often you want this plan to be executed. You have the following options:
I
Today only Every day Every week Every month
5.
You can specify that the Scheduler notify the user upon completion of a scheduled plan. From the Notify by drop-down list, you can select the following notification options:
I
Email Logfile Email & logfile Do not notify
8-40
6.
From the Run result drop-down list, you can select either Run plan only or Save plan as snap. Save plan as snap saves the results of the plan as a snap view. Select Run plan only. Click OK.
7.
To inquire about the plans that have been scheduled on a certain date or time, right-click the date. A window, Scheduled Plans For This Day, lists all scheduled plans, along with the date and time information.
Deleting a Scheduled Run of a Plan

To delete a scheduled run of a plan, you use Data Mart Builder Admin. Take these steps:
1. 2. 3. 4. 5.
Invoke Data Mart Builder Admin. Expand the folder Plans. In the Enter Repository Password dialog box, enter manager. Select the plan that is scheduled to run. From the properties sheet, select the Scheduling tab. Select the run that you want to delete and click Unschedule.
Manage 8-41
8-42
9
External Data
The Value of External Data
Many companies are investing in data marts so that they can access critical customer and product information and thus improve their competitive advantage in the market place. Understanding the key customer characteristics that influence a purchase, frequency of purchases, and spending potential are essential to making sound customer investment decisions. Most data marts are built using data from internal operational systems, which may contain details of the transaction such as customer name, address, product number, date, and dollar amount of the purchase, but do not include any of the characteristics such as demographic, socioeconomic, or lifestyle information that allow more intimate analysis of the customer. To include such information in the data mart, you need to contact an external data provider. Such external data lets you perform analyses such as:
I
Determining your most profitable customer segments (for example by age, income, marital status) Developing more targeted win-back strategies Analyzing receivables by income levels Analyzing product sales by region, household, age, and so on Developing loyalty programs for your most profitable customers Analyzing company financials by age, income, and so on Developing faster to-market strategies for your products
This chapter looks at the offering from one external data provider, Acxiom Corporation, to illustrate external data elements. As part of the Oracle Data Mart Suite offering, Oracle Corporation has alliances with Acxiom to provide its
External Data
9-1
InfoBase Package for the Oracle Data Mart Suite
line-of-business customer with access to extensive customer information and a line of related service offerings.
InfoBase Package for the Oracle Data Mart Suite

Acxiom provides the InfoBase Premier package, which contains consistent, predictable household-level data collected from multiple leading providers of consumer information, resulting in high coverage and accuracy. This section offers a brief overview of Acxioms external data offering. The data elements included in the InfoBase Premier package are:
I
Age, which is collected primarily through questionnaires, drivers licenses, and voter registration records. It is reported in 2-year increments that can be added together to create appropriate age bands useful for the data mart. Marital Status, which represents whether the household consists of a married couple or singles. Marital status is derived through questionnaires, title codes, or the presence of a male adult and a female adult with the same surname. Childrens Age Ranges, which is banded to represent the presence of one or more children in the following life stages: toddlers, preschool, elementary, high school, or post-high-school. Data is determined by questionnaires, capture of product purchase data for children, observation, insurance records, and other sources. Occupation, which represents categories of occupational information to separate employed individuals (such as managers, service workers, and professionals) from nonemployed individuals (such as homemakers, retired individuals, and students). Data is determined primarily by questionnaire information and by title codes such as Dr. Estimated Income, which represents an estimated amount of annual income for the household. Some data may be self-reported through questionnaires, but some is estimated from a variety of other data elements such as occupation, home ownership, home value, automobile ownership, and automobile value. The precision of the income amount is not exact. Rather, it is a close approximation of the probable category. Credit Card Indicator, which indicates the presence of a type of credit card used in the household. Data is derived from actual credit transactions and questionnaires.
9-2
How to Get the External Data
Home Owner/Renter, which indicates whether the primary dwelling is owned or rented. Data is compiled from county tax assessor information, deed transfers, and questionnaires. Length of Residence, which indicates the length of time that an individual has lived at a particular address. In some cases, this indicates the length of time since the name first appeared on the database compiling the information. Dwelling Size, which indicates a single-family unit or multifamily unit, such as an apartment, townhouse, or duplex.

You can get the external data from Acxiom Corporation in more than one way:
I
Through the Acxiom Data Network This is the easiest way to get the data and is well-suited for those with record sets of one million or less. It is also appropriate for those with larger record sets who have never used external data and wish to experiment with a subset of their data. The Acxiom Data Network is an online, on-demand, around-the-clock system that is available through the Internet or a private network connection. To use the Acxiom Data Network, your data must: Include the columns First Name, Middle Initial (can be blank), Last Name, Street Address, and Zip Code Be stored in an Open Database Connectivity (ODBC) format system, such as the Oracle database server
To get the data, take these steps:

1.
Visit the Acxiom Corporation Web site at the following URL:

http://www.acxiom.com/adn
2. 3.
Apply for an account, following the directions on the Web site. When you receive your account, visit the Web site and proceed through the initial download and installation process.
By contacting Acxiom by phone (appropriate for larger record sets):

1.
Get an extract of complete name, complete address, and a unique record identifier to match to your data mart file. The records have name, address,
External Data
9-3
city, and postal code in fixed fields. Preferred media options are mainframe media like 9-track tapes and PC media like 3.5-inch diskettes. Acxiom has a preferred layout that you should use.
2. 3.
Contact Acxiom at (501) 336-3600 and request an order form. Ship your extract tape with your completed order form to the address listed. Acxiom Corporation will return the original flat record with InfoBase Premier data appended in a standard format to the end of the record.
For technical requirements and frequently asked questions (FAQs), please contact your Acxiom data consultant.
9-4
10
Summary
Congratulations! You have successfully walked through the processfrom start to mart! The data mart for the Global Computing Company is up and running. Now you can apply what you have learned in this cookbook to building your own data mart. As mentioned before, a critical factor in the success of the data mart project is the resources that are available. This chapter discusses the project team and project planning and provides a checklist of activities and deliverables to get you started.
Project Team
Although this cookbook cannot estimate how many people you need to implement your data mart, it can give you an idea of the kind of skill sets you need to have available. Minimally, you will need expertise to fill the roles listed in the following table. One individual may fill multiple roles, so the actual size of the team may be only two or three people.
Title/Role Project manager/data mart manager
Responsibility Educating the end user and management on the application, benefits, and limitations of the data mart; defining budgets and schedules; keeping the project on track; motivating and managing the team. Designing the database, administering user access and security, and monitoring and tuning the database for query performance.
Database administrator
Summary 10-1
Project Plan
Title/Role Data administrator
Responsibility Administering the database and identifying data sources.
Business Assisting in specifying the business requirements and helping analyst/decision-support the end users to find the right information. This person provides analyst the day-to-day interface with the end users and knows their information issues better than anyone else on the team. System programmer/analyst Loading data, ensuring data quality, and developing systems to monitor that the data has been updated as expected.
Data mart owner (vice Approving the ultimate design of the data mart. president or equivalent of End-User Department) End-User Department manager/analyst Fine-tuning the specific requirements at the data level.
Project Plan
While your team is gathering requirements from the business user, you, the project manager, need to develop your project plan. Because this is a project of short duration, you do not need a long, complicated plan. The project plan includes a list of key deliverables with scheduled dates and responsible parties. This is a powerful vehicle to lead you through the step-by-step development of the data mart. You should publish a plan so that all parties know and understand what your team will be doing, what their involvement is, and when various key tasks should be completed. This will allow you, your team, and senior management to understand and monitor your progress. As you develop your project plan, make sure that you build in sufficient slack. Murphys law often holdswhat should not happen usually does and what should happen often takes longer than expected. You should also allocate time in your plan for documentation. There are many tools for managing projects, and any good tool will work for this effort.
10-2
Checklists
Checklists
You can use the following checklists to keep track of your project. The tasks listed are mirrored both by the chapters in this cookbook and the lesson plans of the instructor-led course.
Requirements and Design Gather business requirements Prepare requirements document Identify data mart architecture Create list of user data elements Identify sources Classify data for target database Create data model Develop technical architecture Identify existing environment Define hardware, software requirements Manage metadata Define loading strategy (ETT considerations) Date Completed
Summary 10-3
Checklists
Construct Define high-level system requirements Map logical design to physical database Perform capacity planning Determine performance criteria Define index strategy Create physical data mart database Define optimization strategy Create indexes Cost-based or rule-based optimizer and hints Gather statistics Parallel processing Parallel query Star query processing Partitioning
Date Completed
10-4
Checklists
Extraction, Transformation, Transportation Specify data from data sources Define criteria for data selection Capture source metadata Develop data filtering process Design data flow plan Plan data extraction Bulk data loading Change data capture Develop data transforms Data integration Data aggregation and summarization Data volume and population Develop customized plug-ins Populate database (data transportation) Capture target metadata Manage metadata Develop a plan to keep synchronized with source changes
Date Completed
Summary 10-5
Checklists
Access Define End User Layer (EUL) Metadata definition Build and manage end-user work area Define hierarchies Define access paths Optimize for performance Set up query predictor Set up query governor Utilize cache Manage user access roles and security Define query and reports Custom formatting Drill down/drill up Drill to RDBMS Publish and subscribe Integrate with desktop Integrate with network
Date Completed
10-6
Checklists
Manage Database management Security access and control for metadata Security access and control for data mart Database backup strategy for metadata Database backup strategy for data mart ETT administration Manage repository Manage security Manage and schedule process Implement backup and recovery strategy Data Updates Implement updates to fact table Implement updates to dimension tables
Date Completed
We hope that this cookbook was helpful and informative. Good luck with your data mart project!
Summary 10-7
Checklists
10-8
Index
Numerics
5-Way Router Transform, 5-6 creating, 3-28, 3-41, 3-44 authenticating users, 8-3, 8-5 average (AVG) function, 5-7 AVG function, 5-7
A
access, 6-1, 7-1 by users, 8-2 granting, 4-3, 6-15 read-only, 6-3 simultaneous, 4-3 access paths, 4-15, 4-16 full table scans, 4-16 index scans, 4-16 access rights, 8-2 accessing step, 1-6 Add Columns Transform, 5-7 Admin component of Data Mart Builder, 5-12 administration task lists, 6-15 agents, 8-28 Aggregation Transform, 5-7 allocating disk space, 4-8 ALWAYS_ANTI_JOIN parameter, ANALYZE command, 4-18, 5-53 analyzing data, 6-28 analyzing fact tables, 5-53 applications changing, 3-41 freezing, 3-40 unfreezing, 3-41 archived redo log files, 8-12 ARCHIVELOG mode, 8-13, 8-14 attributes, 3-9
B
backing up databases, 8-11, 8-13 backing up repositories, 8-26 backing up tablespaces, 8-15 backup files, 8-12 BaseViews, 1-13, 5-2 adding target tables to, 5-26, 5-27 creating, 5-22 for source data, 5-22 for target data, 5-25 source, 5-13 target, 5-16 Batch Loader Transform, 5-10 Batch Update Transform, 5-10, 8-37 bitmap indexes, 4-12, 4-25 blocks database reading, 8-19 default size, 8-19 Breakpoint Transform, 5-6 B-tree indexes, 4-12 buffer cache, 8-18 Builder See Oracle Data Mart Builder business areas adding descriptions, 6-14 calculated items, 6-21 creating, 6-10
8-19
Index-1
creating conditions, 6-26 defining, 6-4 formatting, 6-17 granting access, 6-15 hiding items, 6-19 identifying sources, 6-11 joins, 6-23 naming, 6-14 renaming columns, 6-18 renaming folders, 6-17 selecting tables, 6-12 business modeling, 1-13 business requirements defining, 3-3
C
calculated columns creating, 5-48 calculated items business areas, 6-21 capacity planning, 8-16 Cartesian product, 4-25 case study accessing data, 6-35 adding datafiles, 8-20 adding users and roles, 8-4 building reports, 7-2 creating summary tables, 8-21 creating the metalayer for, 6-8 designing data marts, 3-18 information requirements, 2-2 introduction, 2-1 managing databases, 4-28 populating data marts, 5-18 user accounts, 2-5 categories in Data Mart Builder, 5-16, 5-24 charts, 6-32 checklists, 10-1, 10-2 clients definition of, 4-5 client/server architecture, 4-5 client-side cubic cache, 6-8 collapse, 6-33
column definitions creating, 3-46 column manipulation transforms, 5-7 Column Select Transform, 5-7 using, 8-35 columns calculated, 5-48 renaming, 5-7 Command Line Sink Transform, 5-8 composite keys, 3-14 concatenated joins, 5-28 concatenated keys, 3-14 Concatenation Transform, 5-6 concurrency, 4-3 Conditional Splitter Transform, 5-6, 8-34 conditions creating, 6-26 connectors, 5-3, 5-31 constellation schemas, 3-13 constraints reenabling, 5-53 SQL*Loader Transform and, 5-53 constructing step, 1-5 control files, 4-5, 8-13 backups, 8-14 copying tables, 3-50 cost-based optimizer, 4-17 COUNT function, 5-7 CPUs database size and, 8-16 Create EUL Wizard, 6-9 CREATE TABLE statement parallel, 4-21, 8-21 creating custom transforms, 5-12 creating indexes, 4-23 creating roles, 8-4, 8-9 creating summary tables, 6-27, 8-21 creating table definitions, 3-46 creating tablespaces, 4-32 creating time dimension tables, 5-32 creating user IDs, 8-5 crosstab format reports, 6-30 creating, 6-40 currency formatting, 6-21
Index-2
D
data accessing, 4-16 analyzing, 6-33 extracting, 5-13, 5-29 investigating, 6-33 loading parallel, 4-22 transforming, 5-29 updating, 8-28, 8-29 data blocks, 4-8, 4-11 Data Collection Agent, 5-3, 8-28 managing, 8-28 data definition language (DDL), 4-15 See also DDL scripts data extents, 4-11 data filtering, 5-15 data flow, 5-2 dividing, 8-34 Data Flow Editor, 5-3, 5-29, 5-30 data flow plans, 5-3 adding transforms, 5-31 creating, 5-29 debugging, 5-6 deleting scheduled run, 8-41 parts of, 5-3 renaming, 5-52 running, 5-29 saving, 5-52 scheduling, 8-39 data flow transforms, 5-5 data generation transforms, 5-9 data manipulation language (DML), 4-15 data manipulation transforms, 5-5, 5-6, 5-7 Data Mart Builder See Oracle Data Mart Builder Data Mart Designer, 3-9 creating applications, 3-21, 3-33 creating users, 3-19 Design Editor, 3-2 Entity Relationship Diagrammer, 3-2 overview, 3-2 Server Generator, 3-2 setting up, 3-19
data marts access, 6-1 read-only, 6-3 building, 1-3 definition of, 1-1 dependent, 1-2 designing, 3-1 differences from data warehouse, 1-2 growth of, 8-20 independent, 1-2 logical design, 3-7 maintaining, 8-1 managing, 1-13, 8-1 performance, 8-15 physical design, 3-7 populating, 5-16 schemas, 3-13 data modeling, 1-11 data protection, 4-2 data refreshes, 8-28 deleting scheduled run, 8-41 scheduling, 8-39 data requirements case study, 3-23 gathering, 3-10 data security, 4-3, 8-2 data sinks, 5-3 data sources, 5-13 identifying, 3-11 data transformation, 5-15 data transportation, 5-16 data warehouses definition of, 1-1 differences from data marts, 1-2 database buffer cache, 8-18 database configuration, 8-17 Database Design Transformer foreign key constraints and, 3-32 using, 3-46 database failure, 8-11 database instances, 4-4 database size CPUs and, 8-16 database users for case study, 2-5
Index-3
databases backing up, 8-11, 8-13 backup files, 8-12 backup modes, 8-13 increasing size, 4-30 instance failure, 8-11 media failure, 8-12 privileges, 8-4 recovery, 8-11, 8-14 structures used for, 8-12 roles, 8-4 shutting down, 4-35, 4-36 size of disk size and, 8-17 starting up, 4-35, 4-38 types of failure, 8-11 datafiles, 4-5, 4-8 adding, 8-20 creating, 4-32 DB_BLOCK_BUFFERS parameter, 8-18 DB_BLOCK_SIZE parameter, 8-19 DB_FILE_MULTIBLOCK_READ_COUNT parameter, 8-19 DDL See data definition language (DDL) DDL scripts generating, 3-52 debugging plans, 5-6 default tablespaces, 8-3 degree of parallelism, 4-20, 4-21 default, 4-22 setting, 4-22 deleting items in repositories, 8-27 Delimited Text File Sink Transform, 5-9 Delimited Text File Source Transform, 5-5 using, 5-44, 8-32 dependent data marts, 1-2 design logical, 3-7, 3-9 physical, 3-7, 3-46 Design Editor displaying design, 3-47 using, 3-33 Designer See Data Mart Designer
designing data marts, 3-1 steps in, 3-1 designing step, 1-4 dimension tables, 4-23 creating plans for, 5-36 populating, 5-36 refreshing, 8-30 updating, 8-30 dimensions, 3-10, 3-13 defining entities as, 3-30 direct path load, 4-22, 5-52 Discoverer See Oracle Discoverer disk failure, 8-12 recovering from, 8-14 Disk Sort Transform, 5-8 disk space allocating, 4-8 increasing, 4-8 distributed processing, 4-4 DMDB sample database, 2-4, 4-28 listing tablespaces in, 4-31 structures, 2-7 tablespaces, 2-4 user accounts, 2-5 DML See data manipulation language (DML) drill to detail, 6-24 drilling down, 6-33, 6-47 in reports, 6-42 drilling out, 6-47 drilling up, 6-33
E
Edit Sheet Wizard, 6-40 End User Layer (EUL), 6-4 installing, 6-8 end-user access managing, 6-5 entities, 3-9 creating, 3-28 Entity Relationship Diagrammer, 3-9 symbols in, 3-41, 3-42 using, 3-27, 3-41
Index-4
entity-relationship diagrams creating, 3-30 reading, 3-31 errors handling, 5-17 estimating size of data mart, 3-16, 3-54 estimating statistics, 4-18 ETT environment managing, 8-23 ETT process, 5-1 EUL See End User Layer (EUL) exception formatting, 6-32 execution plans, 4-15 exporting data, 6-48 repositories, 8-26 Expression Calculator Transform, 5-7 extents, 4-11 external data accessing, 9-1 extracting data, 5-13 source, 5-29 extraction-transformation-transportation (ETT) process See ETT process
creating, 3-46 logical design and, 3-32 maintaining relationships of, 5-14 formats of reports, 6-28, 7-4 formatting business areas, 6-17 currency, 6-21 freezing applications, 3-40 full backup, 8-13 functions in aggregation transform, 5-7
G
General Filter Transform, 5-6 generated keys See warehouse keys generating DDL scripts, 3-52 generating execution plans, 4-15 generating statistics, 5-53 Global Computing Company See case study granting privileges, 8-4 granularity, 3-12, 3-14, 3-25 Graphics Builder, 7-2 Graphics Runtime, 7-2 graphs, 6-32, 6-45 Grid Transform, 5-8 using, 5-31 grids, 5-3
F
fact tables, 4-23 analyzing, 5-53 loading data, 5-52 populating, 5-47 updating data in, 8-29 facts, 3-10, 3-14 defining entities as, 3-30 file specifications, 2-7 Filter Transform, 5-6 filtering data, 5-6, 5-15 Five-Way Router Transform, 5-6 Fixed Length File Source Transform, 5-5 using, 5-40 flat files extracting data from, 5-5, 5-40, 5-44 foreign keys constraint definitions
H
hash joins, 4-17 blocks and, 8-19 enabling, 8-19 memory and, 8-18 HASH_AREA_SIZE parameter, 8-18 HASH_JOIN_ENABLED parameter, 8-19 HASH_MULTIBLOCK_IO_COUNT parameter, 8-19 hierarchies, 6-33 defining, 6-22 Hierarchy Wizard, 6-22 hints
Index-5
optimizer, 4-21, 4-22 STAR_TRANSFORMATION, HTML files exporting data to, 6-48, 7-2 hyperdrilling, 6-24 hypertext markup language exporting data to, 6-48
4-25
K
Key Generation Transform using, 5-33, 5-37, 5-42, 8-35 warehouse keys generating, 5-9 Key Lookup Transform, 5-9 using, 8-33 keys, 3-14, 4-24 composite, 3-14 concatenated, 3-14 lookup, 8-33 maintaining referential integrity, 5-14 natural, 3-50, 5-17 primary, 3-14 synthetic, 3-15 warehouse, 3-15, 3-41, 5-17 generating, 5-33, 5-37, 5-42, 8-31, 8-35 resolving during update, 8-30
I
independent data marts, 1-2 index scans, 4-16 indexes, 4-12 bitmap, 4-12, 4-25 B-tree, 4-12 parallel creation of, 4-23 partitioned, 4-25 information requirements, 2-2, 2-3 initialization parameters setting, 8-18 init.ora parameter file, 4-6, 4-18 instance failure, 8-11 Instance Manager See Oracle Instance Manager instances database, 4-4 Internet viewing data from, 6-4 I/O subsystem, 8-17 DB_BLOCK_SIZE parameter, 8-19 Item Class Wizard, 6-25 item classes defining, 6-24
L
Live Previewer, 7-14 Load Wizard, 6-11, 6-14 loaders, 5-3 loading data, 5-17, 5-31 Batch Loader Transform, 5-10, 8-37 fact table, 5-52 from flat files, 5-5 parallel direct path load, 4-22 SQL*Loader Transform and, 5-10, 5-34, 5-38 direct path load, 5-49, 5-52 time dimension tables, 5-32 transforms and, 5-9 log files, 8-12 logical design, 3-7, 3-9 creating, 3-27 final model of case study, 3-45 transforming into physical, 3-46
J
Join Transform, 5-5 joins, 4-16, 5-25 Cartesian product, 4-25 concatenated, 5-28 creating, 6-23 creating in Builder, 5-28 maintaining, 5-14 Oracle Reports and, 7-7 star, 4-24
M
maintaining data partitioning and, 4-27 managing data marts, 1-13, 8-1 managing step, 1-7
Index-6
many-to-one relationships, 3-31 master-detail crosstab format reports, MAX function, 5-7 maximum (MAX) function, 5-7 measures, 4-24 media failure, 8-11, 8-12 recovering from, 8-14 striping and, 8-18 memory allocating for sorts, 8-18 requirements, 8-16 shared, 4-4 memory process, 8-16 Memory Sort Transform, 5-8 metadata, 3-18 business, 3-18 capturing, 3-33 source, 5-13 technical, 3-18 MetaViews, 1-13, 5-2 creating, 5-22 for source data, 5-23 for target data, 5-25 source, 5-13 target, 5-16 Microsoft Excel exporting to, 6-48 MIN function, 5-7 minimum (MIN) function, 5-7
6-31
N
natural keys, 5-17 nested loop joins, 4-17 Network Computing model Oracle Reports and, 7-2 New Graph Wizard, 6-45 NOARCHIVELOG mode, 8-13, 8-14 NOT IN operator evaluating, 8-19
O
offline backups, 8-13 OLTP source schema, 5-19
online backups, 8-13 online redo log files, 8-12 optimizer, 4-15 cost-based, 4-17 hints, 4-21 STAR_TRANSFORMATION, 4-25 rule-based, 4-17 star queries and, 4-25 OPTIMIZER_MODE parameter, 4-18 Oracle Backup Manager, 4-28 Oracle Data Collection Agent See Data Collection Agent Oracle Data Manager, 4-29 exporting data, 8-26 Oracle Data Mart Builder, 1-9, 5-2, 8-23 Admin component, 5-12 components of, 5-2 errors, 5-17 invoking, 5-21 logging in, 5-21 registered users, 2-6 repository, 2-6 Transform Software Developers Kit, 5-12 Oracle Data Mart Builder Admin, 5-12 logging in, 8-24 Oracle Data Mart Designer See Data Mart Designer Oracle Data Mart Suite, 1-7 components of, 1-8 Oracle Discoverer, 1-10, 6-2 components of, 6-2 logging in, 6-35 Oracle Discoverer Administration Edition, 6-4 connecting to, 6-10 Oracle Discoverer End User Layer, 6-4 Oracle Discoverer User Edition, 6-3, 6-28 toolbar, 6-34 Oracle Discoverer Viewer, 6-4 Oracle Enterprise Manager, 1-9, 8-20 components of, 4-28 logging in, 4-28 usage conventions, 4-30 Oracle Instance Manager, 4-29 logging in, 4-35 Oracle Reports, 1-10, 7-1
Index-7
components of, 7-2 features, 7-1 formats for, 7-2 formatting, 7-14 modifying, 7-16 multitier architecture, 7-18 Oracle Schema Manager logging in, 4-29 Oracle Security Manager logging in, 4-29, 8-4 Oracle server, 4-4 Oracle Software Manager, 4-29 Oracle SQL Worksheet, 4-29 logging in, 4-29 Oracle Storage Manager, 4-29 adding space to databases, 4-30 logging in, 4-29, 8-20 Oracle Web Application Server, 1-10 Oracle8 Enterprise Edition, 1-9 outliers, 6-32
P
page-detail crosstab format reports, 6-31 page-detail crosstab report creating, 6-40 page-detail table format reports, 6-29 paging, 8-17 parallel CREATE TABLE . . . AS SELECT (PCTAS) statement, 4-21, 8-21 parallel direct path load, 4-22 parallel index creation, 4-23 parallel processing, 4-18 degree of, 4-20, 4-21 setting, 4-22 operations, 4-21 SQL statements, 4-21 parallel query, 4-19 PARALLEL_MAX_SERVERS parameter, 8-19 PARALLEL_MIN_SERVERS parameter, 8-19 parallelism degree of, 4-21, 4-22 tuning for, 8-17 parameter file init.ora, 4-6
parameters initialization, 8-18 Part Bin, 5-30 partial backups, 8-13 partitioned indexes, 4-25 partitioned tables, 4-25, 5-16 parts in Data Mart Builder, 5-16, 5-24 passwords for user accounts, 2-5 PCTAS See parallel CREATE TABLE . . . AS SELECT (PCTAS) statement performance, 8-15 bitmap indexes and, 4-12 capacity planning, 8-16 CPUs, 8-16 direct path load and, 4-22 initialization parameters, 8-18 I/O subsystem, 8-17 memory requirements, 8-16 parallel index creation, 4-23 parallel processing and, 4-18 parallel query and, 4-19 partitioning and, 4-27 star query optimization, 4-25 tuning database, 8-17 physical design, 3-7 creating, 3-46 creating from logical, 3-46 displaying, 3-47 physical memory, 8-17 pivoting reports, 6-32 plans See data flow plans populating dimension tables, 5-36 time, 5-32 populating fact tables, 5-47 populating step, 1-6 populating target schema, 5-31 population steps in, 5-16 population transforms, 5-9 primary keys, 3-14 constraint definitions
Index-8
creating, 3-46 dimension tables, 5-17 fact table, 5-52 maintaining relationships of, 5-14 primary unique identifiers creating, 3-41, 3-43 privileges, 4-3, 6-5 granting, 8-4 to roles, 8-10 Oracle Discoverer and, 6-16 process modeling, 1-12 processes memory, 8-16 Oracle server, 4-4 parallel, 4-18 project planning, 10-2 checklists, 10-3 summary, 10-1 project team estimating size of, 10-1
Q
QSTATS workbook, 6-7 query parallel, 4-19 star, 4-24 Query Builder, 7-5 query coordinator, 4-20 query editor, 5-30 query governor, 6-7 query performance improving, 6-6 query prediction, 6-7 query servers, 4-20 specifying maximum, 8-19 specifying minimum, 8-19 quotas, 4-10 definition of, 4-10 for users, 8-3
R
record manipulation transforms, 5-6 Record Number Transform, 5-8
recoverability index creation and, 4-23 table creation and, 4-21 recovering databases, 8-11, 8-14 structures used for, 8-12 recovering tablespaces, 8-15 redo entries, 8-12 redo log files, 4-5, 8-12, 8-13 referential integrity constraints reenabling, 5-53 referential integrity relationships in source schema, 5-25 maintaining, 5-14 refreshing data, 8-28 dimension tables, 8-30 fact tables, 8-29 scheduling, 8-39 relationships creating, 3-32, 3-43 defining, 3-30 mandatory, 3-31 many-to-one, 3-31 removing duplicate column, 8-35 removing rows from data flow, 5-6 Rename Columns Transform, 5-7 using, 5-38 renaming columns, 5-7, 5-38, 6-18 folders, 6-17 Report Builder, 7-2 invoking, 7-3 using, 7-3 Report Wizard, 7-16 using, 7-3 reports, 1-10, 7-1 adding subtotals to, 6-39 changing, 7-16 creating, 6-28, 6-35, 7-3 crosstab format, 6-30 creating, 6-40 customizing, 6-47, 7-12, 7-13 deploying, 7-18 drilling, 6-33 formatting, 6-17, 6-38, 6-47, 7-4, 7-14 changing width, 6-38
Index-9
currency, 6-21, 7-15 graphical format, 6-32, 6-45 headings formatting, 6-47 hiding items, 6-19 master-detail crosstab format, 6-31 modifying formats, 6-32 page-detail crosstab format, 6-31 creating, 6-40 page-detail table format, 6-29 pivoting, 6-32 previewing, 7-14 renaming, 6-47 table format, 6-29 templates, 7-13 types of, 6-28 Reports Runtime, 7-2, 7-18 Reports Server, 7-2 repositories, 5-12 backing up, 8-26 cleaning up, 8-27 Data Mart Builder, 2-6, 5-12 deleting items in, 8-27 exporting, 8-26 managing, 8-25 registering, 8-25 Repository Administration Utility creating users, 3-19 Repository Object Navigator creating applications, 3-33 requirements definition, 3-3 resource profiles, 8-3 reverse-engineering source data, 3-33 roles assigning, 8-6 assigning privileges, 8-10 creating, 8-4, 8-9 roll-up, 6-33 data, 4-21 row sources, 4-15 rowid, 4-16 rule-based optimizer, 4-17
S
sample databases DMDB, 2-4 source data, 2-7 Save to Table Transform, 5-9 scalability, 3-16 scans full table, 4-16 index, 4-16 Schedule Workbook Wizard, 6-49 scheduling plans, 8-39 deleting run, 8-41 scheduling workbooks, 6-49 Schema Manager See Oracle Schema Manager schema objects, 4-11 schemas constellation, 3-13 snowflake, 3-13 star, 1-8 scope of data mart defining, 3-2 scripts generating, 3-52 Search and Replace Transform, 5-7 searchlight button, 5-30 security business areas and, 6-15 data, 4-3 managing, 8-2 user profile, 8-3 using Oracle Security Manager, 8-4 segments, 4-11 server definition of, 4-5 Server Generator using, 3-52 SGA, 4-4 shared global area (SGA), 4-4 SHARED_POOL_SIZE parameter, 8-19 shutting down databases, 4-36 sink transforms, 5-8 sinks, 5-3 size of data mart, 3-16
Index-10
snowflake schemas, 3-13 sort alternate, 6-24 transform for, 5-8 SORT_AREA_SIZE parameter, 8-18 SORT_DIRECT_WRITES parameter, 8-19 sorting values, 6-24, 7-9, 8-19 sort-merge joins, 4-17 source data capturing changes in, 5-14 reverse-engineering, 3-33 source metadata, 5-13 source transforms, 5-4 Splitter Transform, 5-6 SQL Command Sink Transform, 5-9 SQL Command Transform, 5-7 SQL Editor, 5-15 SQL Query Transform, 5-4 using, 5-50 SQL scripts generating, 3-52 SQL statements optimization, 4-15 processing, 4-15 SQL Text Filter, 5-15 SQL Worksheet, 4-29 using, 5-53 SQL*Loader Transform, 5-10 append option, 8-36 using, 5-31, 5-34, 5-38 direct path load, 5-49, 5-52 SQL*Loader utility direct path load, 4-22 staging tables, 5-19, 5-47 adding to physical design, 3-50 standard deviation (STDDEV) function, 5-7 star joins, 4-24 star queries, 4-24 optimizing, 4-25 star schemas, 1-8, 3-13, 4-23 designing, 3-14 populating, 5-9 target, 5-20 star transformation, 4-25 STAR_TRANSFORMATION hint, 4-25
STAR_TRANSFORMATION_ENABLED parameter, 4-25, 8-19 starting up databases, 4-38 statistics estimating, 4-18 generating, 4-18 STDDEV function, 5-7 storage management, 4-2 Storage Manager See Oracle Storage Manager storing data, 4-1 striping disks, 8-17 media recovery and, 8-18 Substring Transform, 5-6 subtotals adding to reports, 6-39 SUM function, 5-7 summary redirection, 6-6 summary tables, 6-6 creating, 6-27, 8-21 Summary Wizard, 6-27 sums of columns, 7-7 synthetic keys, 3-15 See also warehouse keys
T
table format reports, 6-29 tables adding to BaseViews, 5-26, 5-27 database, 4-11 dimension, 4-23 fact, 4-23 loading, 5-31 partitioned, 4-25, 5-16 staging, 5-19, 5-47 summary, 6-6, 6-27 creating, 8-21 tablespaces, 4-8 backing up, 8-15 controlling availability of data in, creating, 4-32 default user, 8-3, 8-5 displaying, 4-31
4-10
Index-11
in DMDB, 2-4 recovering, 8-15 temporary, 8-3, 8-5 target star schema, 5-20 populating, 5-31 target tables, 2-7, 5-13 technical requirements defining, 3-4 templates for reports, 7-13 temporary tablespaces, 8-3 Terminal Sink Transform, 5-9 throughput, 8-17 time dimension tables creating, 5-32 generating, 5-10 populating, 5-32 Time Generation Transform, 5-10 using, 5-32 Time Lookup Transform, 5-10 Tool Bin, 5-31 Data Mart Builder, 5-4 Transform Software Developers Kit, transforming data, 5-15, 5-29 transforms, 5-3, 5-15 built-in, 5-4 column manipulation, 5-7 customized, 5-11, 5-12 data flow, 5-5 data generation, 5-9 data manipulation, 5-5, 5-6, 5-7 population, 5-9 record manipulation, 5-6 sink, 5-8 source, 5-4 Visual Basic and, 5-11 transporting data, 5-16 Tree Control, 5-30 tuning databases, 8-17
Batch Update Transform, 5-10, 8-37 dimension tables, 8-30 fact tables, 8-29 user accounts for sample databases, 2-5 users authenticating, 8-3, 8-5 creating ID, 8-5 managing, 8-2 profiles, 8-3 quotas, 8-3 resource profiles, 8-3 security profile, 8-3
V
VARIANCE function, 5-7 VBScript transforms, 5-11 VBScriptCopy Transform, 5-11 VBScriptInplace Transform, 5-11 VBScriptSink Transform, 5-12 VBScriptSource Transform, 5-12 versions creating in Designer, 3-40 virtual memory, 8-17 Visual Basic transforms for, 5-11
5-12
W
warehouse keys, 3-15, 5-17 generating, 5-9, 5-33, 5-37, 5-42, 8-31 resolving during update, 8-30 Web exporting data to, 6-48, 7-2, 7-18 viewing data from, 6-4 Web servers, 1-10 Web Wizard, 7-18 wizards Create EUL, 6-9 Edit Sheet, 6-40 Hierarchy, 6-22 Item Class, 6-25 Load, 6-11 New Graph, 6-45
U
unfreezing applications, Union Transform, 5-5 updating data, 8-28 3-41
Index-12
Report, 7-3, 7-16 Schedule Workbook, 6-49 Summary, 6-27 Web, 7-18 Workbook, 6-34, 6-35 Workbook Wizard, 6-34, 6-35 workbooks creating, 6-35 QSTATS, 6-7 scheduling, 6-49 Workspace, 5-30
Index-13
Index-14

Oracle Data Mart

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Oracle Data Mart

Hochgeladen von

Copyright:

Verfügbare Formate

Oracle Data Mart Suite

The Oracle Data Mart Suite Cookbook

August 1999 Part No. A75671-01

2 Case StudyThe Global Computing Company

Requirements and Design

Extraction, Transformation, and Transportation

Access to the Database

Send Us Your Comments

About the Cookbook

Who Should Read This Cookbook

How This Cookbook Is Organized

Conventions Used in This Cookbook

Where You Can Get Additional Help

What Is a Data Mart?

How Is It Different from a Data Warehouse?

Data Mart Concepts

What Is a Data Mart?

Dependent and Independent Data Marts

The Oracle Data Mart Suite Cookbook

What Are the Steps in Building a Data Mart?

What Are the Steps in Building a Data Mart?

Designing Constructing Populating Accessing Managing

Data Mart Concepts 1-3

What Are the Steps in Building a Data Mart?

Thats all there is to it!

What Products and Technologies Do You Need?

The Oracle Data Mart Suite Cookbook

What Are the Steps in Building a Data Mart?

What Products and Technologies Do You Need?

Data Mart Concepts 1-5

What Are the Steps in Building a Data Mart?

What Products and Technologies Do You Need?

The Oracle Data Mart Suite Cookbook

Oracle Data Mart Suite

What Products and Technologies Do You Need?

What Products and Technologies Do You Need?

Oracle Data Mart Suite

Data Mart Concepts 1-7

What Is in Oracle Data Mart Suite?

What Is in Oracle Data Mart Suite?

Oracle Data Mart Designer

The Oracle Data Mart Suite Cookbook

What Is in Oracle Data Mart Suite?

Oracle8 Enterprise Edition

Oracle Enterprise Manager

Oracle Data Mart Builder

Data Mart Concepts 1-9

What Is in Oracle Data Mart Suite?

Oracle Web Application Server

The Oracle Data Mart Suite Cookbook

How Do You Build a Data Mart Using These Products?

How Do You Build a Data Mart Using These Products?

Data ModelingData Mart Design and Construction

Data Mart Concepts 1-11

How Do You Build a Data Mart Using These Products?

Process ModelingPopulating the Data Mart

The Oracle Data Mart Suite Cookbook

How Do You Build a Data Mart Using These Products?

Business ModelingAccessing the Data

Managing the Data Mart

Data Mart Concepts 1-13

How Do You Build a Data Mart Using These Products?

The Oracle Data Mart Suite Cookbook

Case Study Scenario