Sie sind auf Seite 1von 134

DB2 Warehouse

Version 9.5



DB2 Warehouse Tutorial

SC18-9801-04

DB2 Warehouse
Version 9.5



DB2 Warehouse Tutorial

SC18-9801-04

Note
Note: Before using this information and the product it supports, read the information in Notices on page 121.

This edition applies to Version 9.5 of the DB2 Warehouse products and to all subsequent releases and modifications
until otherwise indicated in new editions.
Copyright International Business Machines Corporation 2007. All rights reserved.
US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.

Contents
DB2 Warehouse Tutorial, Version 9.5 . . 1
Introduction to the DB2 Warehouse Tutorial . . . . 1
Running the tutorial in a Windows client-server
environment . . . . . . . . . . . . . 7
Running the tutorial in a Linux client-server
environment . . . . . . . . . . . . . 8
Running the tutorial in a mixed client-server
environment . . . . . . . . . . . . . 8
Optional: Introduction to the Design Studio . . . . 9
Lesson 1: Design Studio perspectives, views,
editors, and projects . . . . . . . . . . 10
Lesson 2: Customizing the Design Studio . . . 12
Module 1: Designing the physical data model for
your data warehouse . . . . . . . . . . . 13
Lesson 1: Creating a data design project in the
Design Studio . . . . . . . . . . . . 14
Lesson 2: Creating a physical data model based
on the DWESAMP database . . . . . . . . 14
Lesson 3: Adding foreign key constraints to the
tables in the MARTS schema . . . . . . . 15
Lesson 4: Validating your physical data model . 16
Lesson 5: Updating the DWESAMP database
with changes from the data model . . . . . . 17
Module 2: Designing applications to build a data
warehouse . . . . . . . . . . . . . . . 20
Optional: Start the tutorial here . . . . . . . 20
Lesson 1: Setting up the warehouse building
environment . . . . . . . . . . . . . 22
Lesson 2: Designing a data flow that loads a
warehouse table . . . . . . . . . . . . 24
Lesson 3: Modifying a data flow that loads a
dimension table in a data mart . . . . . . . 33
Module 3: Deploying and running an application
that loads a data mart . . . . . . . . . . . 35
Lesson 1: Designing the control flow for the data
mart . . . . . . . . . . . . . . . . 36
Lesson 2: Preparing a data warehouse application
for deployment . . . . . . . . . . . . 38
Lesson 3: Deploying the application that loads
the MARTS tables . . . . . . . . . . . 39
Lesson 4: Running and monitoring a process in a
data warehouse application . . . . . . . . 42
Module 4: Designing OLAP metadata . . . . . 43
Optional: Start the tutorial here . . . . . . . 44
Lesson 1: Creating a complete cube model . . . 45
Lesson 2: Adding a hierarchy to the Time
dimension . . . . . . . . . . . . . . 50
Lesson 3: Creating a cube . . . . . . . . . 51
Lesson 4: Deploying your OLAP metadata to the
DWESAMP sample database . . . . . . . 53
Lesson 5: Creating MQT recommendations using
the Optimization Advisor wizard . . . . . . 53

Copyright IBM Corp. 2007

Lesson 6: Deploying your recommended


materialized query tables (MQTs) . . . . . . 55
Lesson 7: Adding the cube to the cube server . . 57
Module 5: Creating Alphablox reports based on IBM
cubes . . . . . . . . . . . . . . . . 58
Lesson 1: Setting up the Alphablox environment 59
Lesson 2: Creating an analytic application . . . 60
Lesson 3: Building Alphablox queries with Blox
Builder . . . . . . . . . . . . . . . 61
Lesson 4: Building Alphablox reports with Blox
Builder . . . . . . . . . . . . . . . 64
Lesson 5: Customizing your queries and reports 66
Lesson 6: Creating and deploying an application
to the WebSphere server . . . . . . . . . 70
Module 6: Creating a mining model . . . . . . 71
Optional: Start the tutorial here . . . . . . . 72
Lesson 1: Creating a business intelligence project
in the Design Studio for data mining . . . . . 73
Lesson 2: Creating a mining flow in the Design
Studio . . . . . . . . . . . . . . . 74
Lesson 3: Defining mining steps for mining flows 74
Lesson 4: Running and viewing the mining
model . . . . . . . . . . . . . . . 81
Module 7: Using Miningblox to create a mining Web
application . . . . . . . . . . . . . . 82
Lesson 1: Creating the Miningblox Sample project 83
Lesson 2: Creating the Miningblox application . . 86
Lesson 3: Creating the Alphablox data source . . 88
Lesson 4: Deploying the data warehouse
application . . . . . . . . . . . . . 89
Lesson 5: Deploying the Web application . . . 90
Lesson 6: Customizing your application . . . . 93
Module 8: Combining text analysis and OLAP. . . 95
Optional: Start the tutorial here . . . . . . . 96
Lesson 1: Understanding the data used in this
module . . . . . . . . . . . . . . . 97
Lesson 2: Using the Text Analysis tools . . . . 98
Lesson 3: Building a star schema . . . . . . 105
Lesson 4: Defining an OLAP model for the star
schema and deploying it to Cubing Services . . 108
Lesson 5: Creating an Alphablox report . . . . 110
Summary . . . . . . . . . . . . . . . 114
Glossary . . . . . . . . . . . . . . . 114

Notices . . . . . . . . . . . . . . 121
Trademarks .

Contacting IBM

. 123

. . . . . . . . . . 125

Product Information . . . . .
Accessible documentation . . .
Comments on the documentation.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

. 125
. 125
. 126

iii

iv

DB2 Warehouse Tutorial, Version 9.5


Learn how to build and deploy an end-to-end business intelligence solution by
using DB2 Warehouse, Version 9.5. This tutorial introduces you to the highlights
of DB2 Warehouse so that you can start using the product quickly and easily.
Throughout the tutorial, you work on developing, deploying, and administering an
analytics-based warehousing solution for the fictional retail company, JK
Superstore.
Using a client-server Windows or Linux configuration, follow this tutorial and
learn how to:
v Design and update a physical data model for a warehouse database
v Design applications for building warehouse and mart tables by using SQL-based
data flows and control flows
v Deploy and run warehouse building applications in the Administration Console
v Design a complete cube model and deploy performance-enhancing MQTs
v Design a Blox Builder report that is based on OLAP metadata
v Design a mining model that analyzes purchasing trends
v Create a mining web application with Miningblox
v Analyze unstructured data by using text analysis tools, an OLAP cube, and an
Alphablox report
You can complete individual modules in approximately 60 - 90 minutes. You can
complete the entire tutorial in about 10 hours.

Introduction to the DB2 Warehouse Tutorial


In this tutorial, you design and deploy a business intelligence solution that
expands the capabilities of the DB2 data warehouse for a fictional company named
JK Superstore.
Business scenario
JK Superstore is a retail enterprise with a chain of department stores. Their product
lines include clothing, shoes, beauty, home furnishings, and electronics. The
company has grown steadily for the last several years, plans to extend business
into new markets, and wants to ensure that it maintains its profitability during the
expansion.
DB2 Warehouse is an enterprise product that can help JK Superstore by providing,
for example:
v Graphical tools for designing data movement from OLTP sources to data
warehouse and data mart tables in a DB2 database
v Optimized data analysis through the intelligent use of aggregated data
v An easy-to-build interface for designing customizable reports on sales volumes
and other critical measures
v Graphical data mining tools that help analysts reveal buying patterns, such as
sets of products that tend to be bought at the same time
Copyright IBM Corp. 2007

Note: Module 8: Combining text analysis and OLAP on page 95 uses a different
scenario to show how the IT department at JK Superstore uses unstructured text
analysis to study trends in the IT job market.
The JK Superstore data warehousing team is committed to consolidating the
company data into a DB2 database, and a new warehouse that provides a
consistent data source for analysis and reporting. The technical teams data
architect has designed the data warehouse in the DWH schema, and a data mart
for analysis in the MARTS schema.
The DWH schema contains the transactional data for the JK Superstore retail chain.
Figure 1 shows the physical data model of the DWH schema.

Figure 1. Physical data model of the DWH schema

The following table describes the nine tables that are in the physical model:
Table 1. Description of the tables in the physical data model of the DWH schema

Physical table name

Expanded table name

Description

ITM_TXN

Item transaction

The item transaction table


contains individual
transactions that record the
transfer of a single product
item (or multiple identical
product items) from JK
Superstore to a customer, or
vice versa. For example, one
transaction can represent a
single barcode scan at a
checkout or a refund for a
returned product.

Table 1. Description of the tables in the physical data model of the DWH schema (continued)
Physical table name

Expanded table name

Description

MKT_BSKT_TXN

Market basket transaction

The market basket


transaction table groups the
individual transactions that
occur together as one event.
For example, one market
basket transaction can
represent the collection of
items purchased by a single
customer at one time.

OU

Organization unit

The organization unit table


contains the individual stores
that are part of the JK
Superstore retail chain.

PD

Product

The product table identifies


goods and services that can
be offered or sold by the JK
Superstore.

PD_X_GRP

Products by group

The products by group table


defines the relationship
between a product and a
product grouping. Products
might be listed multiple
times if they belong to more
than one product group or if
they are listed in historical
product groupings that are
still maintained.

GRP

Group

The group table identifies a


specific grouping of products
that is of interest to JK
Superstore. A group can be
composed of other groups. A
group can be created for
marketing, management,
control, or reporting
purposes.

IP

Involved party

The involved party table


contains people that are
related to the business
activities of JK Superstore,
such as store managers and
distribution contacts.

MSR_PRD

Measurement period

The measurement period


table records the intervals of
time at which measurements
are captured in the
warehouse.

CL

Customer

The customer table contains


anonymous customer IDs.
For the purposes of this
tutorial, these IDs are
necessary for mining.

DB2 Warehouse Tutorial, Version 9.5

The MARTS schema contains the aggregated data for the JK Superstore retail chain
that is required for sales and pricing analysis. Figure 2 shows the physical data
model of the MARTS schema.

Figure 2. Physical data model of the MARTS schema

The following table describes the four tables that are in the physical model of the
MARTS schema:
Table 2. Description of the tables in the physical data model of the MARTS schema
Physical table name

Expanded table name

Description

PRCHS_PRFL_ANLYSIS

Purchase profile analysis

The purchase profile analysis


table is the fact table. It
contains sales metrics for
products and market baskets
that are purchased by
customers.

STORE

Store

The store table is a dimension


table. It corresponds to the
organization unit table in the
data warehouse.

TIME

Time

The time table is a dimension


table. It corresponds to the
measurement period table in
the data warehouse.

PRODUCT

Product

The product table is a


dimension table. It
corresponds to the product
table in the data warehouse.

This tutorial shows you how to use the main DB2 Warehouse features to
implement an end-to-end business intelligence solution for JK Superstore.

Learning objectives
The tutorial has the following learning objectives:
v Design and update a physical data model for a warehouse database
v Design applications for building warehouse and mart tables by using SQL-based
data flows and control flows

v
v
v
v
v
v

Deploy and run warehouse building applications in the Administration Console


Design a complete cube model and deploy performance-enhancing MQTs
Design a Blox Builder report that is based on OLAP metadata
Design a mining model that analyzes purchasing trends
Create a mining web application with Miningblox
Analyze unstructured data by using text analysis tools, an OLAP cube, and an
Alphablox report

Time required
The complete tutorial should take approximately 11 hours to finish.
However, you can work on specific modules individually rather than complete the
entire tutorial from start to finish. Most of the modules take approximately 60 to 90
minutes each to complete, according to your familiarity with database,
warehousing, and business intelligence concepts and practices. If you are
experienced in these areas, the following table shows the estimated amount of time
that is required to complete each module.
Table 3. Time required to complete each module
Module

Time required

Optional: Introduction to the Design Studio

20 minutes

Module 1: Designing the physical data model 60 minutes


for your data warehouse
Module 2: Designing applications to build a
data warehouse

120 minutes

Module 3: Deploying and running an


application that loads a data mart

60 minutes

Module 4: Designing OLAP metadata

100 minutes

Module 5: Creating Alphablox reports based


on IBM cubes

90 minutes

Module 6: Creating a mining model

75 minutes

Module 7: Using Miningblox to create a


mining Web application

60 minutes

Module 8: Combining text analysis and


OLAP

90 minutes

You can start this tutorial from the optional introduction module, Module 1,
Module 2, or Module 4. To start from Module 2 or Module 4, complete the Start
the tutorial here lesson at the beginning of those modules.
You can also skip all of the lessons in Module 6 by completing a shortcut lesson at
the beginning of the module.
To see the results of most of the tutorial lessons, you can open the completed
sample projects in the Design Studio. From the File menu, select New Example
Data Warehousing Examples and complete the wizard. You can also access this
wizard directly from the Design Studio Welcome view.

DB2 Warehouse Tutorial, Version 9.5

Skill level: Moderate


This tutorial assumes some conceptual knowledge of data warehousing and
business intelligence, but does not assume any knowledge of the DB2 Warehouse
implementation of those concepts. To familiarize yourself with DB2 Warehouse
terminology, see the Glossary on page 114.

Audience
This tutorial covers a wide range of data warehousing and BI features, including
SQL warehousing flows, OLAP metadata and summary tables, mining models,
Miningblox applications, Alphablox reports, and unstructured text analysis. Many
enterprises divide the design and administration tasks and the domain areas (SQL
warehousing, OLAP, mining, reporting) among multiple people. Some of the
lessons in this tutorial might not apply to you. However, each lesson should apply
to someone on your team.

System requirements
To complete this tutorial from end to end, you must install several server, client,
and documentation components of DB2 Warehouse on one or more systems.
DB2 Warehouse server components
v DB2 Enterprise Server Edition, Version 9.5
v Intelligent Miner
v WebSphere Application Server
v Cubing Services
v IBM Alphablox
v Administration Console
DB2 Warehouse client components
v IBM Data Server Client
v Design Studio
Intelligent Miner plug-ins
SQL Warehousing Tool plug-ins
Cubing Services plug-ins
IBM Alphablox Blox Builder plug-ins
v Intelligent Miner Visualization
Documentation
v DB2 Warehouse Samples and Tutorial

Prerequisites
The tutorial setup scripts, which create the DWESAMP sample database, are
certified to run on a Windows or Linux instance of DB2. Complete the tutorial
by using one of the following client-server configurations:
v A Windows-only configuration, with the Design Studio and the runtime
environment installed on two separate Windows computers
v A Windows-to-Linux configuration, with the Design Studio installed on a
Windows client and the runtime environment installed on a 64-bit Linux server.

v A Linux-only configuration, with the Design Studio and the runtime


environment installed on two separate Linux computers (a 32-bit client and a
64-bit server)
Before you start the tutorial, follow the setup instructions for your configuration:
Windows

Running the tutorial in a Windows client-server environment


Linux

Running the tutorial in a Linux client-server environment on page 8


Mixed Running the tutorial in a mixed client-server environment on page 8
Restriction: The tutorial is not intended to be run on a single client computer.

Expected results
If you finish the entire tutorial, you will have a complete working DB2 data
warehouse that is optimized for analysis. You will also have two web-based
reports. Each module lists the expected results to help you track your learning
progress when you finish each module.

Running the tutorial in a Windows client-server environment


You can run the tutorial in a Windows client-server environment after you follow
these setup instructions.
To set up the Windows environment for the tutorial:
1. Install the DB2 Warehouse components on two separate Windows computers:
v Install the client components and the Samples and Tutorial on the Windows
client.
v Install the server components on the Windows server and run the DB2
Warehouse Configuration Tool, as directed in the DB2 Warehouse Installation
Guide, Version 9.5.
2. Optional: If you intend to complete Module 3: Deploying and running an
application that loads a data mart on page 35, copy the following SQL scripts
from the client to an accessible directory on the server:
v C:\Program Files\IBM\dwe\samples\data\emptyMartsTables.sql
v C:\Program Files\IBM\dwe\samples\data\countMartTables.sql
3. Check that you have SYSADM authority on the DB2 database server. You need
this authority to create and modify the sample database. Throughout this
tutorial, the db2admin user ID is assumed to have SYSADM authority.
4. From the server, create the DWESAMP database manually. If the database
already exists, drop the database first.
db2 drop database DWESAMP
db2 create database DWESAMP

5. From the client, catalog the remote DWESAMP database as a local database
with exactly the same name. You must use uppercase letters for the name
DWESAMP. Use the CATALOG TCPIP NODE and CATALOG DATABASE
commands, as described in the DB2 Information Center.
6. Verify that the cataloged DWESAMP database is visible from the client:
db2 list database directory

7. From the client, create and load the tables in the DWESAMP database:
DB2 Warehouse Tutorial, Version 9.5

a. Open a DB2 Command Window.


b. Go to the directory where the setup script was installed. The default
installation path to the setup script is: C:\Program Files\IBM\dwe\
samples\data\setupdwesamp.bat. Do not try to run the script without
going to the data directory first. For information about what this script
does, see the Readme.txt file in the data directory.
c. Type setupdwesamp.bat -r DWESAMP db2admin password
When the script is complete, a list of row counts is displayed for the loaded
tables.

Running the tutorial in a Linux client-server environment


You can run the tutorial in a Linux client-server environment after you follow
these setup instructions.
To set up the Linux environment for the tutorial:
1. Install the DB2 Warehouse components on two separate Linux computers:
v Install the client components and the Samples and Tutorial on the 32-bit
Linux client.
v Install the server components on the 64-bit Linux server and run the DB2
Warehouse Configuration Tool, as directed in the DB2 Warehouse Installation
Guide, Version 9.5.
2. Optional: If you intend to complete Module 3: Deploying and running an
application that loads a data mart on page 35, copy the following SQL scripts
from the client to an accessible directory on the server:
v /opt/IBM/dwe/samples/data/emptyMartsTables.sql
v /opt/IBM/dwe/samples/data/countMartTables.sql
3. Check that you have SYSADM authority on the DB2 database server. You need
this authority to create and modify the sample database. Throughout this
tutorial, the db2inst1 user ID is assumed to have SYSADM authority. Run the
tutorial under the db2inst1 user ID, which is the DB2 instance owner that is
created when DB2 Warehouse is installed.
4. From the server, create the DWESAMP database manually. If the database
already exists, drop the database first.
db2 drop database DWESAMP
db2 create database DWESAMP

5. From the client, catalog the remote DWESAMP database as a local database
with exactly the same name. You must use uppercase letters for the name
DWESAMP. Use the CATALOG TCPIP NODE and CATALOG DATABASE
commands, as described in the DB2 Information Center.
6. From the client, create and load the tables in the DWESAMP database:
a. Go to the following directory: /opt/IBM/dwe/samples/data
b. Type ./setupdwesamp.sh -r DWESAMP db2inst1 password
When the script is complete, a list of row counts is displayed for the loaded
tables. For detailed information about what this script does, see the
readme_linux.txt file in the data directory.

Running the tutorial in a mixed client-server environment


You can run the tutorial with a Windows client and a 64-bit Linux server after you
follow these setup instructions.

To set up the mixed client-server environment for the tutorial:


1. Install the DB2 Warehouse components on two separate computers:
v Install the client components and the Samples and Tutorial on the Windows
client.
v Install the server components on the 64-bit Linux server and run the DB2
Warehouse Configuration Tool, as directed in the DB2 Warehouse Installation
Guide, Version 9.5.
2. Optional: If you intend to complete Module 3: Deploying and running an
application that loads a data mart on page 35, copy the following SQL scripts
from the client to an accessible directory on the server:
v C:\Program Files\IBM\dwe\samples\data\emptyMartsTables.sql
v C:\Program Files\IBM\dwe\samples\data\countMartTables.sql
3. Check that you have SYSADM authority on the DB2 database server. You need
this authority to create and modify the sample database. Throughout this
tutorial, the db2inst1 user ID is assumed to have SYSADM authority.
4. From the server, create the DWESAMP database manually. If the database
already exists, drop the database first.
db2 drop database DWESAMP
db2 create database DWESAMP

5. From the client, catalog the remote DWESAMP database as a local database
with exactly the same name. You must use uppercase letters for the name
DWESAMP. Use the CATALOG TCPIP NODE and CATALOG DATABASE
commands, as described in the DB2 Information Center.
6. From the client, create and load the tables in the DWESAMP database:
a. Open a DB2 Command Window.
b. Go to the directory where the setup script was installed. The default
installation path to the setup script is: C:\Program Files\IBM\dwe\
samples\data\setupdwesamp.bat. Do not try to run the script without
going to the data directory first. For information about what this script
does, see the Readme.txt file in the data directory.
c. Type setupdwesamp.bat -r DWESAMP db2inst1 password
When the script is complete, a list of row counts is displayed for the loaded
tables.

Optional: Introduction to the Design Studio


You use the Design Studio for many of the lessons in this tutorial. By taking a few
minutes to become familiar with the Design Studio, you can be more productive
and able to complete the tasks in this tutorial more quickly.
You complete the following lessons in this module:
v Design Studio perspectives, views, editors, and projects
v Customizing the Design Studio

Learning objectives
After completing the lessons in this module you will:
v Be familiar with perspectives, views, editors, and projects in general
v Be familiar with the BI and Blox Builder perspectives, including the Data Project
Explorer, Database Explorer, and Properties views
v Understand how to customize your view of the Design Studio
DB2 Warehouse Tutorial, Version 9.5

Time required
This module should take approximately 20 minutes to complete.

Lesson 1: Design Studio perspectives, views, editors, and


projects
Each window in the Design Studio contains one or more perspectives. Perspectives
contain views and editors, and control what the Design Studio displays in certain
menus and tool bars.
When you open the Design Studio, it prompts you to specify your workspace. The
workspace is the directory where your tutorial work is stored. Select the default
location for now. You can change the location of the workspace later as required.
After you specify the workspace, the Design Studio displays the Welcome view,
which includes links to the DB2 Warehouse documentation, tutorials, and sample
files. To open the Welcome view at any other time, click Help Welcome.
Perspectives
The Design Studio opens the BI perspective by default after you close the Welcome
view. This visual component of the Design Studio defines the initial set and layout
of the business intelligence views. These views show various resources that you
can use to build information warehouses, and enable warehouse-based analytics
such as OLAP and data mining. You can switch perspectives in the Design Studio,
depending on the task at hand. To work with the tutorial, you use the BI
perspective and the Blox Builder perspective.
Views
A view is a visual component of the Design Studio that typically contains trees,
access to editors, or a display of properties for the active editor. You use views to
navigate the Data Project Explorer and Database Explorer information trees, open
the related editors, or display and review properties. When you modify the
information in a view, the Design Studio saves it immediately.
The BI and Blox Builder perspectives include several different views. The options
and controls that you can work with differ among views. To open a closed or
hidden view, click Window Show View and select that view.
The Navigator and Outline views do not apply to your work in the tutorial. You
do most of your work in the following views in the BI perspective:
Data Project Explorer
This view is open by default in the upper, left area of the Design Studio.
This hierarchical tree displays the metadata that you use in the tutorial to
build flows and applications. You can navigate the related projects and
objects that you create. You work with this view most often to select and
modify the contents of a physical data model, such as tables and columns.
This view is not live and does not provide direct access to underlying
databases.
Database Explorer
This view is open by default in the lower, left area of the Design Studio. It
is a hierarchical tree of the live databases that you can explore, connect to,
create models from, and ultimately modify. To do any of the tutorial tasks

10

that actually modify a database, you must have a DB2 user account that
includes the appropriate authority and privileges. The DB2 databases (and
aliases) that exist in your local catalog are listed automatically in the
Database Explorer. You can set up connections to other databases as
needed.
Properties
This view, and others such as Data Output and Problems, is open by
default in the lower, right area of the Design Studio. Each of these views
has a title tab that you click to make it active, which brings it to the
foreground. You can use the Properties view to define and modify many of
the objects that you create. To open the Properties view when it is closed
or hidden, click Window Show View Properties.
Tip: If you cannot find an option or control that you expected to work with,
ensure that you have opened the correct view. Notice that the Data Project
Explorer options and controls differ from those in the Database Explorer.
Editors
An editor is a visual component of the Design Studio that you typically use to
browse or modify a resource, such as an object in a project. When you use an
editor to modify an object, you must explicitly save the changes, because the
Design Studio does not automatically save them.
The Design Studio displays the appropriate editor for the object type that you are
working with; for example, different editors are available for physical data models
and control flows. An editor typically has an associated custom palette. The Design
Studio opens the editor in the upper, right area of the canvas by default, and opens
its palette on the right side of the canvas.
Projects
A project is a set of objects that you create in the Design Studio as part of the data
transformation or warehouse building processes. You can build projects in the
Design Studio, and test their validity without impacting the database. Each project
that you build is represented by an icon in the Data Project Explorer, where you
can expand it, explore its contents, and access editors that enable you to work with
it. You create different types of objects according to the type of project you are
building.
You save your project files in the workspace directory of your file system.
Integration with Concurrent Versions System (CVS), which is an open source
version control and collaboration tool, allows you to share Design Studio projects,
and work with them in a coordinated development team environment.
To work with this tutorial, you primarily use the following project types:
Data design project (OLAP)
You use a data design project for database design and information
integration. A data design project can include physical data models, OLAP
objects, and scripts.
Data warehouse project
You use a data warehouse project for designing and building the
warehouse. This project type can include SQL warehousing objects such as
physical data models, data flows, control flows, and mining flows.
DB2 Warehouse Tutorial, Version 9.5

11

Blox Builder project


You use a Blox Builder project to design IBM Alphablox applications,
which include reports and queries.
Tip: At various places in this tutorial, you are asked to work with virtual table
names that include the same numeric identifier. You might notice that the virtual
table names you see use a different numerical identifier. This is a typical condition,
as these identifiers differ according to where you place the Design Studio operators
on the canvas.
For example, when the tutorial refers to virtual tables DATA_016.MKT_BSK_TXN_ID
and LOOKUP_016.MKT_BSK_TXN_ID, the virtual table names that you see may be
DATA_094.MKT_BSK_TXN_ID and LOOKUP_094.MKT_BSK_TXN_ID.

Lesson checkpoint
In the Design Studio, you work with perspectives, which provide a variety of
useful views and editors.
You learned about the following concepts:
v The Design Studio BI and Blox Builder perspectives and the views and editors
that you use in this tutorial
v Projects that you create in this tutorial, such as data design and data warehouse
projects

Lesson 2: Customizing the Design Studio


In this lesson, you customize the layout and size of the Design Studio windows to
optimize the visual work area.
As explained in Lesson 1, the Design Studio has many views and editors that you
use for the various tasks involved in building an information warehouse that is
optimized for analytics. However, you rarely use all of the available views and
editors at the same time. It is easier to work in the Design Studio if you know how
to customize the size and layout of its views and editors. In this lesson, the term
window can be either an editor or a view.
Use the following methods to customize your Design Studio layout:
Maximizing and minimizing windows
To complete certain tasks in the tutorial, you work almost exclusively with
one window or view. You might want to expand the window to the full
size of the Design Studio.
v To maximize a window, double-click the tab at the top of the window
that specifies the windows name, such as Properties. To return the
window to the size it was before you maximized it, double-click the tab
again.
) in the upper
v To minimize certain windows, click the minimize icon (
right corner of that window. The minimize option is not available on all
Design Studio windows. To restore the window after you minimize it,
click the restore icon (

).

Rearranging dockable windows


You can rearrange the position of windows within the Design Studio. To
move a window, click its title bar and drag the window across the Design

12

Studio. When you drag the window, a rectangular highlight indicates


where it will dock. Release the mouse button when you have placed the
window in a suitable location.
To reset a perspective to its original layout, click Window Reset
Perspective and then click OK.
Closing and opening windows
To close an open view, click the close button on the right side of its title
bar. To open a closed view or show a hidden one, click Window Show
View and select that view.

Lesson checkpoint
To use the Design Studio more efficiently, you can customize the arrangement of
the views and editors that you use.
You learned how to:
v Maximize and minimize windows
v Rearrange dockable windows
v Close and open windows

Module 1: Designing the physical data model for your data warehouse
In this module, you use the Design Studio to import the existing physical data
model for the new JK Superstore data warehouse and marts and complete the
design of the MARTS schema. You also update the DWESAMP database with the
MARTS schema changes.
In the previous module, you became familiar with the Design Studio GUI and
navigation features that you use in this tutorial. In this module, you learn how to
use the Design Studio to complete the following lessons:
v Creating a data design project
v Creating a physical data model based on the DWESAMP database
v Adding foreign key constraints to the tables in the MARTS schema
v Validating your physical data model
v Updating the DWESAMP database with changes from the data model

Learning objectives
After you complete the lessons in this module, you will be able to:
v Create a data design project in the Design Studio
v Reverse engineer a physical data model based on a database
v Use the editor to modify a schema by adding constraints
v Analyze a physical data model to ensure its validity
v Deploy your updated physical database design to a database

Time required
This module should take approximately 60 minutes to complete.

DB2 Warehouse Tutorial, Version 9.5

13

Prerequisites
You must have the Design Studio installed and you must meet all of the
prerequisites that are described in Introduction to the DB2 Warehouse Tutorial
on page 1.

Lesson 1: Creating a data design project in the Design Studio


In this lesson, you create a project in the Design Studio so that you can work on
your physical data model. This kind of model includes the objects and attributes
that correspond with entities in the DB2 catalog, such as tables, columns, and
views.
You create projects in the Design Studio so that you can work on different parts of
your business intelligence solution. Like other project types, you navigate your
data design project in the Data Project Explorer, which is the working space where
you build and test your solution. To work specifically on your physical data
model, you need to create a data design project.
To create a data design project, complete the following steps:
1. Open the Design Studio:
Windows

Click Start All Programs IBM DB2 Warehouse V9.5


DB2WCOPY01 Design Studio where DB2WCOPY01 is the default name
of the DB2 copy.
Linux

a. Set the DB2 profile by typing: . ~db2inst1/sqllib/db2profile


b. Run /opt/IBM/dwe/Client/DesignStudio
If prompted, select the default workspace.
2. In the Design Studio, click File New Project.
3. In the New Project wizard, expand the Data Warehousing folder, select Data
Design Project (OLAP), and click Next.
4. In the New Data Design Project wizard, type Tutorial Data Model for the
project name, and click Finish.
The Design Studio displays your Tutorial - Data Model project icon in the Data
Project Explorer.

Lesson checkpoint
You learned how to create a data design project, which you can use for physical
data modeling.

Lesson 2: Creating a physical data model based on the


DWESAMP database
In this lesson, you create a physical data model so that you can explore it and
change its schemas. You create this model by reverse engineering (copying) the
existing DWESAMP database model.
To create a physical data model that is based on the DWESAMP database:
1. Connect to the locally cataloged DWESAMP database.
a. In the Database Explorer, expand the Connections folder to view the
existing databases.

14

b. Right-click the DWESAMP database and click Reconnect.


c. When prompted, type your DB2 user name and password and click OK.
2. Use the New Physical Data Model wizard to create the physical data model.
Right-click the Data Models folder in the Data Project Explorer and click New
Physical Data Model.
3. On the Model File page, specify the following selections:
a. Change the File name field to DWESampleTutorial.
b. Check that the Version field is set to V9.5.
c. Select Create from reverse engineering and click Next.
4. On the Source page, check that the Database option is selected and click Next.
5. On the Select Connection page, select Use an existing connection and select
DWESAMP from the Existing connections list. Click Next.
6. On the User Information page, type your database username and password.
Click Next.
7. On the Schema page, select the check boxes for the DWH and MARTS schemas.
Click Next.
8. On the Database Elements page, ensure that the following items are selected,
and then click Next. The Routines option is not selected by default, so you
need to select it.
v Tables
v Indexes
v Triggers
v Routines
v Sequences
v Table spaces
9. On the Options page, select the Generate Diagrams - Overview check box to
create schema diagrams and make sure that the Infer implicit relationships
check box is not selected. Click Finish.
The DWH and MARTS schemas are now in the physical data model in your data
design project. You can explore the schemas and review them. In the next lessons,
you modify the schemas.

Lesson checkpoint
You created a physical data model in your data design project. This model is based
on an existing database.
You learned how to create a new physical data model by reverse engineering the
data model of an existing database.

Lesson 3: Adding foreign key constraints to the tables in the


MARTS schema
In this lesson, you complete the new physical data model by adding constraints
between the fact table and each of the dimension tables in the MARTS schema.
The existing MARTS schema includes the tables, but does not specify the
constraints between the primary keys of the dimension tables and the
corresponding foreign keys in the fact table.
You can use the diagram editor to add the constraints in the Design Studio. To add
the constraints:
DB2 Warehouse Tutorial, Version 9.5

15

1. Open the diagram of the MARTS schema.


a. In the Data Project Explorer, expand the Diagrams folder for the MARTS
schema. If you cannot see the Diagrams folder, reopen the physical data
model that you created in the previous lesson (DWESampleTutorial.dbm)
and expand the tree.
b. Double-click the MARTS diagram icon to open the Filters page of the
Properties view. The diagram includes the PRCHS_PRFL_ANLYSIS,
PRODUCT, STORE, and TIME tables, which are not connected by
constraints.
c. Select the Show key and Show non-key check boxes in the Compartment
display options list.
Tip: To more easily work with the diagram, double-click the MARTS tab at
the top of the diagram to expand it to full screen size. Right-click the
diagram and click Arrange all to reformat the table layout in the diagram.
2. Identify the primary keys for each table.
a. Right-click the PRODUCT dimension table in the diagram and select Show
Properties View. On the Columns page, select the Primary key check box
for the PD_ID column.
b. Specify the following primary keys for the remaining tables in the same
way.
v Select STR_IP_ID as the primary key for the STORE dimension table.
v Select TIME_ID as the primary key for the TIME dimension table.
3. Review the implicit relationships between the tables:
a. In the diagram editor, right-click in a blank area of the canvas.
b. Select Show Implicit Relationships.
The diagram displays the relationships between the foreign keys in the fact
table and the corresponding primary keys in the dimension tables.
4. Save your physical data model by clicking File Save All.

Lesson checkpoint
You added the necessary foreign key constraints between the fact table and each of
the dimension tables.
You learned how to:
v Work with an overview schema diagram
v Add constraints using the diagram editor

Lesson 4: Validating your physical data model


In this lesson, you validate the physical data model to ensure that you did not
introduce problems. You should validate the physical data model after you change
it and before you deploy those changes to the target database.
To validate your physical data model:
1. In the Data Project Explorer, open the Data Models folder, right-click the
MARTS schema icon, and select Analyze Model to open the Analyze Model
wizard.
2. On the Model Analysis Rules page, ensure that the set of physical data model
rules are selected in the Rules categories list. Click Finish.

16

3. In the Problems view, review any errors or warnings that resulted from the
data model analysis. In this case, the analysis process should not result in
errors or warnings.

Lesson checkpoint
You learned how to use the Analyze Model wizard to validate your model against
a set of rules.

Lesson 5: Updating the DWESAMP database with changes


from the data model
In the previous lessons, you updated the design of the DWESAMP database model
in your data design project. In this lesson, you propagate the changes that you
made in your project to the DWESAMP database by comparing and synchronizing
the project and the database, and then generating and running a delta DDL script.
The delta DDL script updates the DWESAMP database with the changes to the
data model that you made in the previous lesson.
You can use the compare feature to compare your database model with the source
database and select the changes that you want to generate in a DDL script.
To update the DWESAMP database with the changes in your physical data model:
1. Right-click the MARTS schema in the Data Project Explorer and click Compare
With Original Source.
2. In the Structural Compare window, review a summary of the changes that you
made to the MARTS schema and synchronize those changes with the target
database.
a. Expand each of the tables in the Item list to view the primary keys and
foreign keys that you added to the physical data model. The window shows
that these keys do not exist in the DWESAMP:DWESAMP.MARTS original
source list. See Figure 3 on page 18.

DB2 Warehouse Tutorial, Version 9.5

17

Figure 3. Structural Compare window before synchronizing

b. Ensure that Schema is selected in the Structural Compare window and click
the Copy from Left to Right icon ( ) in the Property Compare toolbar.
The window shows that all of the changes now exist in the
DWESAMP:DWESAMP.MARTS original source list. See Figure 4 on page 19.

18

Figure 4. Structural Compare window after synchronizing

3. Generate and run the delta DDL script.


a. Click the Generate Right Delta DDL icon (
) in the Property Compare
toolbar.
b. In the Generate DDL wizard, browse for the Tutorial - Data Model folder
and type constraintsddl.sql as the file name. Click Run DDL on server
and then click Next.
c. On the Select Connection window, select Use an existing connection and
specify DWESAMP. If you are prompted for a user name and password,
you need to Open a new connection to DWESAMP. Click Next.
d. On the Summary page, click Finish.
Your DWESAMP database is now updated with the changes that you made to
your physical data model. If the changes are not visible, refresh the connection in
the Database Explorer.

Lesson checkpoint
You generated and ran a DDL script to update the DWESAMP database with the
database model changes that you made in the previous lesson.
You learned how to:
v Compare a database model to a source database
v Generate a delta DDL script that you run to propagate changes from a design
project to a database

DB2 Warehouse Tutorial, Version 9.5

19

Module 2: Designing applications to build a data warehouse


In this module, you use the Design Studio to design and run SQL-based data flows
that help populate and maintain data warehouse tables with the data for
meaningful JK Superstore analytics.
In the previous module, you created a data design project that contains a physical
data model. You use this data design project for the lessons in this module:
v Setting up the warehouse building environment
v Designing a data flow that loads a warehouse fact table
v Modifying a data flow that loads a data mart dimension table

Learning objectives
After you complete the lessons in this module you will be able to:
v Create a data warehouse project that references an existing data design project
v Connect to a DB2 database from the Design Studio and check the contents of
tables
v Design data flows that use various SQL warehousing operators:
File import and export
Table source
Bulk load target
Table join, union, key lookup
v
v
v
v
v

Select list, distinct, order by


Stage intermediate results of a data flow by using a data station operator
Define variables for operator properties, such as file names and table names
Use the SQL Condition Builder and Expression Builder tools to expedite data
flow design
Validate data flows and generate their SQL code
Run data flows directly from the Design Studio and supply run time values for
variables

Time required
This module should take approximately 120 minutes to complete.

Prerequisites
Check that your client computer contains the samples directory that supports this
tutorial:
Windows

C:\Program Files\IBM\dwe\samples
Linux

/opt/IBM/dwe/samples

Optional: Start the tutorial here

20

You can skip the previous modules and start the tutorial here by completing a few
short steps. If you already completed the previous modules, do not complete these
steps; go to Lesson 1.
If you did not complete the earlier modules and want to start here, you need to
complete these steps.
To start the tutorial here:
1. Create the appropriate version of the DWESAMP database by opening a DB2
Command Window and running the following script:
Windows

C:\Program Files\IBM\dwe\samples\data\setupsqw.bat
Linux

/opt/IBM/dwe/samples/data/setupsqw.sh
For information about what the script does, see the Readme.txt or
readme_linux.txt file in the data directory. For general tutorial setup
information, see one of the following procedures:
v Running the tutorial in a Windows client-server environment on page 7
v Running the tutorial in a Linux client-server environment on page 8
v Running the tutorial in a mixed client-server environment on page 8
2. Open the Design Studio:
Windows

Click Start All Programs IBM DB2 Warehouse V9.5


DB2WCOPY01 Design Studio where DB2WCOPY01 is the default name
of the DB2 copy.
Linux

a. Set the DB2 profile by typing: . ~db2inst1/sqllib/db2profile


b. Run /opt/IBM/dwe/Client/DesignStudio
3. Create a data design project called Tutorial - Data Model.
a. In the Design Studio, click File New Project.
b. In the New Project wizard, expand the Data Warehousing folder, select
Data Design Project (OLAP), and click Next.
c. In the New Data Design Project wizard, type Tutorial Data Model for the
project name, and click Finish.
The Design Studio displays your Tutorial - Data Model project icon in the Data
Project Explorer.
4. Create a physical data model with reverse engineering.
a. Use the New Physical Data Model wizard to create the physical data model.
Right-click the Data Models folder in the Data Project Explorer and click
New Physical Data Model.
b. On the Model File page, specify the following selections:
1) Change the File name field to DWESampleTutorial.
2) Check that the Version field is set to V9.5.
3) Select Create from reverse engineering and click Next.
c. On the Source page, check that the Database option is selected and click
Next.

DB2 Warehouse Tutorial, Version 9.5

21

d. On the Select Connection page, select Use an existing connection and select
DWESAMP from the Existing connections list. Click Next.
e. On the User Information page, type your database username and password.
Click Next.
f. On the Schema page, select the check boxes for the DWH and MARTS
schemas. Click Next.
g. On the Database Elements page, click Next.
h. On the Options page, click Finish.
The DWH and MARTS schemas are now in the physical data model in your
data design project.

Lesson checkpoint
You completed the prerequisite steps for starting the tutorial here. You can now
continue with Lesson 1.

Lesson 1: Setting up the warehouse building environment


In this lesson, you set up the warehouse building environment by creating a data
warehouse project, accessing the metadata for source and target tables, sampling
data in a live DB2 database, and creating some variables that you can use when
you design data flows, control flows, and mining flows.
The Design Studio provides a flexible environment for designing and testing BI
applications. For example, file import and export operators in data flows accept
either fixed values (actual file names) or variables as properties. Other operators
also support variables, such as variables for schema names and table names. By
setting variables for these resources, you defer the definition of certain properties
until a later phase in the life cycle of the application. For example, you might not
know which database schema will be used at run time, or you might want to build
a more flexible data flow that runs in different environments.
Variables can be used in combination with fixed values. At the end of this lesson,
you will define two directory variables that point to directories on your system
where specific files are stored. You will use these variables in combination with
fixed file names when you build your first data flow.
To set up the warehouse building environment:
1. Create a new data warehouse project that references the data design project
that you created earlier.
a. Select File New Data Warehouse Project.
b. Type dwhproj in the Project name field and click Next.
c. Select the data design project that you created in the previous module and
click Finish.
The new project is displayed in the Data Project Explorer. The project contains a
set of named folders.
2. View the source and target metadata for your data warehouse project by
completing the following steps:
a. Open the Data Models folder inside the new project and verify that the
DWESampleTutorial.dbm file is visible. Your project has a link to this physical
data model because of the reference to the data design project. The physical
model contains the metadata for the source and target tables that you will
use to build data flows.

22

b. Open the model file and browse the tables in the DWH and MARTS
schemas.
3. In the Database Explorer, verify that you have a connection to the DWESAMP
database. This connection allows you to interact with the live database and see
a sample of the data in the tables.
4. Expand the DWESAMP database tree. Go to Schemas DWH Tables,
right-click the PD table, and select Data Sample Contents. A subset of the
rows in the table is displayed in the Data Output view.
5. Create a variable group that contains two variables. These variables define
directories, which contain files that you need to reference when you build data
flows and control flows later in this tutorial:
a. In the Data Warehousing menu, select Manage Variables. The Manage
Variables window opens.
b. Select the dwhproj project that you created earlier in this lesson and click
Next.
c. Click New on the left side of the window to create a new variable group.
d. Name the group datadirs.
e. Click New on the right side of the window to create a new variable in the
datadirs group.
f. In the Variable Information window, define the new variable as follows and
click OK:
v Name: tempdir
v Type: Directory
v Current value:
Windows

C:\temp
Linux

/tmp
Important: Make sure that this directory exists on the client computer. If
the directory does not exist, create it.
v Final phase for value changes: EXECUTION_INSTANCE
g. Click New on the right side of the window to create a second new variable
in the datadirs group.
h. Define the second variable as follows and click OK:
v Name: sample_datadir
v Type: Directory
v Current value:
Windows

C:\Program Files\IBM\dwe\samples\data
Linux

/opt/IBM/dwe/samples/data
Important: If necessary, adjust the current value to match the path to the
DB2 Warehouse installation directory on your client computer.
v Final phase for value changes: EXECUTION_INSTANCE
i. Close the Manage Variables window.

DB2 Warehouse Tutorial, Version 9.5

23

Lesson checkpoint
You learned how to:
v Create a data warehouse project that references a data design project
v View the source and target metadata for your project
v Connect to a DB2 database and see a sample of its contents
v Create variables that can be used to design flexible flows

Lesson 2: Designing a data flow that loads a warehouse table


In this lesson, you create a new data flow that defines the process for loading the
ITM_TXN fact table in the data warehouse. Data flows are graphical models that
translate your SQL-based transformation requirements into repeatable warehouse
building processes.
The DWESAMP database is a DB2 database that contains a schema named DWH,
which consists of several data warehouse tables. Two of these tables are fact tables:
ITM_TXN and MKT_BSKT_TXN. In this lesson you load the ITM_TXN fact table
by designing and running a data flow in the Design Studio. To save time in this
module, you will use scripts to build the other tables in the data warehouse; in a
production environment, you would use data flows to build these tables also.
The purpose of the data flow that you will design is to load two similar input files
that contain fact data for different time periods. The files are checked for
duplicates, then merged with an SQL UNION expression and stored in a staging
table. Then the data in the staging table is fed into a key lookup operation, which
matches the fact records with key values in two referenced tables before loading
the fact table.
You will create the data flow from beginning to end, using a series of SQL
warehousing operators that select source data from flat files, transform the data,
and bulk load the resulting rows into the target table. In production warehousing
environments, this process might be performed by an ETL tool such as IBM
WebSphere DataStage. This lesson illustrates an end-to-end warehouse building
scenario that uses only the native capabilities of DB2 Warehouse. This lesson
explains how to use the SQL Warehousing Tool to populate a table from external
file sources. Typically the SQL Warehousing Tool is used for in-database flows to
populate analytic structures such as aggregates and data mining tables.
You will build the flow in two pieces. The first piece ends with a data station
operator, which represents a staging point in the flow where the results of the
union operator are stored in a persistent table in the DWESAMP database. This
staging table represents a reliable recovery point in the data flow. The following
figure shows the first piece of the data flow.

24

Figure 5. First piece of the data flow that you will build in this lesson

The second piece of the data flow moves data from the data station through
another series of transformations and finally loads the ITM_TXN table. The
following figure shows the second piece of the data flow.

Figure 6. Second piece of the data flow that you will build in this lesson

In the tutorial, these two pieces form one data flow, but in practice you might
choose to create two data flows instead. The data station operator represents a
persistent target table that could mark the end of the first data flow. That target
table could easily be used as a source table for the second data flow. (If you build
two data flows instead of one, you can use a control flow to run them in
sequence.)
The following instructions assume that you are designing your first complex data
flow in the Design Studio. However, the instructions do not explain very basic
tasks such as how to place operators on the canvas and how to connect them. If
you are unfamiliar with these basic tasks, play the Show Me viewlet for creating a
data flow before proceeding with the lesson. You can launch this viewlet from the
Design Studio by selecting Help Welcome Overview Tour the Design Studio
SQL Warehousing Demonstrations Designing a data flow that loads a
warehouse table.
To design and test-run the data flow that loads the ITM_TXN table:

DB2 Warehouse Tutorial, Version 9.5

25

1. Right-click the Data Flows folder in your data warehouse project and select
New Data Flow.
2. Name the data flow dwh-fact, select Work against data models (Offline) as the
working mode for the flow, and click Finish. The data flow editor opens.
3. In the Properties view underneath the empty canvas, type DWH in the SQL
Execution Schema field and select DWESAMP from the SQL execution database
list. The SQL execution database must be a DB2 database. This database runs
the SQL code that the data flows generate and need not be the same as the
databases where data is extracted and loaded.
4. Leave the two table space fields blank and accept the default setting for the
Use DB2 Data Partitioning Feature (DPF) option. Because the DWESAMP
database is not partitioned, this option will be ignored.
5. Build part 1 and part 2 of the data flow by following the next two procedures
in the tutorial. You need at least one hour to build this flow from beginning to
end.
v Designing the ITM_TXN data flow (part 1)
v Designing the ITM_TXN data flow (part 2) on page 29

Lesson checkpoint
This lesson covered the end-to-end design process for a complex data flow that
loads a warehouse fact table.
You learned how to:
v Create a new data flow
v Define properties for various SQL warehousing operators:
File import and export
Table source
Bulk load target

v
v
v
v
v

Key lookup
Union
Distinct
Use a data station operator to define a staging point in a data flow
Create a new table as part of the data flow, add it to the physical model, and
run its DDL script
Use operator variables that can be replaced with actual values at run time
Use the SQL Condition Builder to expedite the definition of conditions and
expressions
Validate and run a data flow directly from the Design Studio

Designing the ITM_TXN data flow (part 1)


The design of the ITM_TXN data flow is broken into two logical and manageable
parts. In the first part, you build the flow from importing the source files to
populating a staging table with the results of a union operation.
This data flow loads two similar input files that contain fact data. The files are
checked for duplicates, then merged with an SQL UNION expression and stored in
a staging table. Part 2 of this lesson explains how to do some more work on the
data in the staging table before loading the ITM_TXN table.

26

The following figure shows how the first piece of the data flow should look when
it is complete:

Figure 7. First piece of the data flow that you will build

Note: The instructions in these lessons assume that you will use the Properties
view to define the specific details for each operator rather than the wizard pages
that open when you drag certain operators to the canvas. You can close the wizard
pages by clicking Finish without defining any properties. By using the Properties
view below the canvas, you will be able to see the properties and the highlighted
piece of the data flow at the same time.
To build the first piece of the ITM_TXN data flow:
1. Define two file import operators in the same way. These operators read data
from flat files, based on the format that you specify.
a. Drag two file import operators to the left side of the empty canvas.
b. On the General page of the Properties view for the first file import operator,
click the

icon next to the File name field and select Use Variable.

push button, select the sampledir variable, and click


c. Click the
Replace. The variable string ${datadirs/sample_datadir} is displayed in the
File name field.
d. Append the following directory and file name to the sampledir variable
string: /sqw/DWH_ITM_TXN_1.txt.
e. Repeat the process for the second file import operator, but append
/sqw/DWH_ITM_TXN_2.txt to the sampledir variable string. The File name
fields for the file import operators should read as follows:
${datadirs/sample_datadir}/sqw/DWH_ITM_TXN_1.txt
${datadirs/sample_datadir}/sqw/DWH_ITM_TXN_2.txt

The File name values will look the same on Windows and Linux platforms
because you are using a variable for the platform-specific path to the
samples directory. The forward slash character (/) works on both platforms.
f. In the File location list, accept the default entry (Client).
g. On the File Format page, click Load from File Format and browse to the
sample fileformat file:
Windows

C:\Program Files\IBM\dwe\samples\data\sqw\
dwh_itm_txn.fileformat
DB2 Warehouse Tutorial, Version 9.5

27

Linux

/opt/IBM/dwe/samples/data/sqw/dwh_itm_txn.fileformat
Repeat this step for the second file import operator.
h. Do not change the list of selected columns in the Column Select page; all
columns are selected by default.
i. Do not define any properties on the Advanced Options and Partition
Options pages.
Tip: Use the default names of all of the operators when you build this data
flow. You do not need to use the Label field in the General page to rename the
operators. In some cases, if you rename the operators, it is difficult to identify
the source of virtual table columns when they pass through the data flow.
2. Define two distinct operators in the same way:
a. Drag two distinct operators to the canvas and connect each file import
operator to a distinct operator.
b. On the Column Select page of the Properties view, make sure that only the
following three columns are selected for both distinct operators:
v MKT_BSKT_TXN_ID
v PD_ID
v ITM_TXN_TMS
c. Do not define any properties on the Staging Table Settings page.
3. Define two file export operators in the same way:
a. Drag two file export operators to the canvas and connect the discard port of
each distinct operator to a different file export operator.
b. Define the same file format for both file export operators:
1) On the General page, click the
select Use Variable.

icon next to the File name field and

2) Click the
push button next to the File name field, select the
tempdir variable, and click Replace.
3) In the File name field, append /discard1.txt to the variable string.

4.

5.
6.
7.

28

4) Repeat the variable selection process for the second file export operator,
but append /discard2.txt to the variable string.
5) On the File Format and Advanced Options pages, accept the default
settings.
Define a union operator:
a. Drag a union operator to the canvas and connect the result ports of the two
distinct operators to the input1 and input2 ports of the union operator.
b. On the Set Details page of the union operator, select UNION (not
UNION_ALL).
Drag a data station operator to the canvas but do not define any of the
operators properties.
Connect the result port of the union operator to the input port of the data
station operator.
In the Properties view, define the data station operator:
a. Set the station type to PERSISTENT_TABLE.
b. Type ITM_TXN_STAGE in the Table name field and DWH in the Schema name
field.

c. Select the Automatically create staging table check box but do not select
any of the other check boxes on the General page. Selecting the Delete all
rows check box is recommended only if you intend to run the data flow
multiple times and you want to clear the contents of the staging table after
each run. If the staging table is not empty when you run the data flow, the
performance will degrade significantly. One disadvantage to selecting this
option is that you will not be able to inspect the contents of the staging
table after each run.
d. Do not define any properties on the Staging Table Settings page.
8. Save your work.
Complete the data flow by following part 2 of this lesson.

Designing the ITM_TXN data flow (part 2)


The ITM_TXN data flow is broken into two logical and manageable parts. The
second part explains how to build the flow from the data station staging table
through to the final bulk load into the target table.
This piece of the data flow introduces several design techniques that are frequently
used in warehouse building operations:
Lookup operators
Key lookups match key values from intermediate result sets with valid keys
that already exist in referenced tables. The Design Studio provides two
types of lookup operators: key lookup and fact key replace. These
operators are similar to joins, and their properties consist of conditions that
define the matching criteria between the incoming data and one or more
lookup tables.
SQL Condition Builder or Expression Builder
To expedite the process of declaring join conditions, lookup conditions, and
other expressions, many operators contain links to the SQL Condition
Builder or Expression Builder. These Design Studio interfaces provide
context-sensitive lists of input columns that help you complete the
properties for an operator efficiently and accurately. The builder windows
also contain supported functions and SQL predicates that you can use in
any valid expression or condition.
Note: The SQL Condition Builder and SQL Expression Builder interfaces
are specific to the DB2 Warehouse Design Studio. The DB2 SQL builder is a
separate interface.
Bulk load
This data flow ends with a bulk load into the target table. The DB2 bulk
load utility is intended for intensive replace or insert mode loads into
database tables. The data flow palette also contains a basic table target
operator, which generates SQL insert, update, and delete statements.
The following figure shows how the second piece of the data flow should look
when it is complete:

DB2 Warehouse Tutorial, Version 9.5

29

Figure 8. Second piece of the data flow that you will build in this lesson

To complete the ITM_TXN data flow:


1. Drag a key lookup operator to the canvas.
2. Connect the result port of the data station operator to the data port of the key
lookup operator.
3. Add a lookup port to the key lookup operator by clicking the icon below the
existing lookup port.
4. Define two table source operators:
a. Drag two table source operators to the canvas and place them to the left of
the key lookup operator.
b. Define the first table as DWH.PD and the second table as DWH.MKT_BSKT_TXN.
Pick the table names from the Table Selection window by clicking the
push button next to the Source database table field.
c. Do not change the value in the Location field, which defaults to SQL
execution database.
d. Accept the defaults on the Select List and Where Condition pages.
5. Connect the PD table source operator to the first key lookup port (lookup)
and the MKT_BSKT_TXN table source operator to the second port (lookup1).
6. Define the properties of the key lookup operator.
a. On the Condition List page, define the lookup condition for the first
lookup port: DATA_016.PD_ID = LOOKUP_016.PD_ID. Click inside the Value
field, then click the
push button. The SQL Condition Builder
opens. Create the condition in the SQL Text field by selecting the two
PD_ID columns from the Inputs list and the equals sign from the
Operations list. The virtual table names (DATA_016, LOOKUP_016) might be
different in your data flow; they depend on the precise order in which you
place the operators on the canvas.
b. In the same way, define the lookup condition for the second lookup port:
DATA_016.MKT_BSKT_TXN_ID = LOOKUP1_016.MKT_BSKT_TXN_ID.
c. On the Select List page, delete all of the columns from the Result Columns
list, then use the right arrow button to move all of the DATA columns back
into the list. Do not move any LOOKUP columns into the select list.
d. Do not define any properties on the Staging Table Settings page.

30

The following figure shows what the Results Columns list will look like after
you add the DATA columns back into the list:

Figure 9. Result Columns list

7. Define a file export operator:


a. Drag a file export operator to the canvas. Connect the discard port of the
key lookup operator to the input port of the file export operator.
b. Define the File name field by selecting the tempdir variable and
appending /discards_itm_txn.txt to the variable string.
c. On the File Format and Advanced Options pages, accept the default
settings.
8. Drag a bulk load target operator to the canvas and define its properties:
push button next to the Load
a. On the General page, click the
into database table field, select DWH.ITM_TXN, and set the Load Mode value
to REPLACE.
b. On the Advanced Options and Partition Options pages, accept the default
settings.
9. Connect the key lookup operator to the bulk load target operator:
a. Connect the match output port of the key lookup operator to the input
port of the bulk load target operator.
b. On the Select Column Connections window that opens, select Connect by
Name and click OK.
c. Expand the match and input column lists in the key lookup and bulk load
target operators. Verify that all of the columns from the match port are
connected to the corresponding columns of the input port. The following
figure shows the columns connected by name.

DB2 Warehouse Tutorial, Version 9.5

31

Figure 10. Bulk load target operator with the columns connected by name

10. Save and validate the completed flow.


a. Select the data flow by clicking anywhere in the white space of the data
flow editor, then select Data Flow Validate.
b. Optional: Generate the code for the data flow and inspect it by selecting
Data Flow Generate Code.
11. Run the data flow.
a. In the Database Explorer, reconnect to the DWESAMP database if you are
not already connected.
b. Select the data flow in the canvas then select Data Flow Execute. The
Flow Execution window opens.
c. Accept the default run profile, execution schema (DWH), and execution
database (DWESAMP).
d. Open the Variables page and check the file names for the two file export
operators. You can use the default file names for this run. You can also
leave the properties as is on the Diagnostics and Resources pages.
e. Click Execute, then click Run in Background. While the flow is running,
you can check its progress by going to the Execute Status view below the
canvas. When you select an entry in this view, the end of the associated
log file is displayed in the Tail Log view. When execution is complete, the
Execution Result window is displayed, which also contains log
information.
f. Check that the table is loaded by refreshing the DWESAMP database in the
Database Explorer, right-clicking the DWH.ITM_TXN table, and selecting
Data > Sample Contents.
Go back to the overview of lesson 2 to review your progress before you continue
with lesson 3.

32

Lesson 3: Modifying a data flow that loads a dimension table


in a data mart
In this lesson, you modify and complete a complex data flow that loads the STORE
dimension table in the data mart. The STORE dimension in the data mart is based
on data from tables that are already built in the warehouse.
The data flows that build the STORE and PRODUCT dimensions in the MARTS
schema are very similar. The data in these tables comes from warehouse data that
represents store and product information in terms of different levels or hierarchies.
For simplicity and efficiency in reporting and analytical applications, this data is
flattened or denormalized before being loaded into the MARTS tables.
To transform the data, the data flow selects rows from two warehouse tables and
joins them recursively. Each new join receives information at the next level. When
all of these intermediate results are joined, the final result is a set of rows that
contain all of the information about a particular product or store. The result does
not just contain the information that is relevant at a given level.
Because this flow is complex and requires a series of joins back to the same two
source tables, this lesson does not explain how to build the flow from beginning to
end. Instead, the lesson explains how to complete a data flow that already exists in
a sample project that you import at the beginning of the lesson. This flow consists
of a series of joins that involve the OU and IP tables; source data is also selected
from the CL and MKT_BSKT_TXN tables. Your task is to update the final join
operator and add two new operators that end the flow. The result is a complete,
working data flow that inserts rows into the STORE table.
The following figure shows what the last three operators will look like after you
complete the data flow.

Figure 11. Final three operators in the data flow that you are building

To complete the data flow:


1. Import a sample data warehouse project that contains the data flows for the
MARTS schema.
a. From the File menu, select New Example Data Warehousing Examples
SQL Warehousing Sample - Partial.
b. Accept the default project name (SQWSamplePartial) and click Finish. The
sample project is imported into your workspace.
c. Right-click the SQWSamplePartial project and select Project References.

DB2 Warehouse Tutorial, Version 9.5

33

d. Select the Tutorial - Data Model data design project that you worked with
earlier in the tutorial and click OK. The sample data warehouse project is
linked to the data design project.
2. Select Data Warehousing Manage Variables and make sure that the tempdir
and sample_datadir variables are set correctly for your platform and
installation. These two variables belong to the datadirs variable group. By
default, the correct variable values are set for the Windows platform. You might
also need to adjust the sample_datadir path to match the actual path to the
installation directory on your client computer.
v tempdir
Windows

C:\temp
Linux

/tmp
v sample_datadir
Windows

C:\Program Files\IBM\dwe\samples\data
Linux

/opt/IBM/dwe/samples/data
3. Navigate to the Data Flows folder and open the marts-store data flow.
4. On the General page of the Properties view for the data flow, make sure that
the database schema is set to MARTS and that the execution database is set to
DWESAMP. Leave the two table space fields blank and accept the default setting
for the Use DB2 Database Partitioning Feature (DPF) option. Because the
DWESAMP database is not partitioned, this option will be ignored.
5. On the Condition page of the Properties view for the final join operator in the
data flow, add the following condition to the end of the syntax in the Join
condition field: AND IN_034.OU_TP_ID = IN1_034.CL_ID. The virtual table names
might be different in your data flow. The complete set of join conditions is:
IN3_034."parentkey" = IN2_034."memberkey" AND
IN4_034."parentkey" = IN3_034."memberkey" AND
IN5_034."parentkey" = IN4_034."memberkey" AND
IN_034.OU_IP_ID = IN5_034."memberkey" AND
IN_034.OU_IP_ID = IN6_034.OU_IP_ID
AND IN_034.OU_TP_ID = IN1_034.CL_ID

6. Define an order by operator.


a. Drag an order by operator to the canvas.
b. Connect the inner join output port of the final join operator to the input
port of the order by operator.
Tip: For join operators, the port that you choose for the connection to the
next operator determines the type of join that is performed (inner or outer).
By connecting different operators to different join output ports, you can use
a single join operator to generate multiple SQL join statements.
c. In the Sort Key Ordering page of the Properties view, define the STR_IP_ID
column as the sort column for the order by operator.
7. Define a table target operator.
a. Drag a table target operator to the canvas.
b. Set the SQL operation to Insert.
c. Accept the default commit interval.

34

Set the location to SQL Execution Database.


Set the target database table to MARTS.STORE.
Do not check the NOT LOGGED INITIALLY check box.
Connect the result port of the order by operator to the input port of the
target table operator.
h. Map the source columns to the target table as shown in the following
figure:
d.
e.
f.
g.

Figure 12. Result Columns list shows the mapping of columns in the source table to columns in the target table

8. Save and validate the completed flow by selecting the data flow in the canvas
and selecting Data Flow Validate.
9. Run the data flow.
a. Select the data flow in the canvas and select Data Flow Execute. The Flow
Execution window opens.
b. Accept the default run profile, execution schema (MARTS), and execution
database (DWESAMP).
c. Click Execute. After a short time, an Execution succeeded message is
displayed.

Lesson checkpoint
In this lesson, you modified and completed an existing data flow by adding a
series of operators and defining their properties.
You learned how to:
v Define table joins and other SQL warehousing operators
v Build onto an existing data flow

Module 3: Deploying and running an application that loads a data mart


In this module, you embed data flows inside a control flow and deploy a data
warehouse application to the WebSphere Application Server environment. You then
run, schedule, and manage this application by using the DB2 Warehouse
Administration Console.
This module consists of the following lessons:
DB2 Warehouse Tutorial, Version 9.5

35

v
v
v
v

Designing a control flow that loads the data mart


Preparing a data warehouse application for deployment
Deploying a new application
Running, scheduling, and monitoring processes and activities

Learning objectives
After you complete the lessons in this module you will know how to:
v Design control flows that contain the following operators:
Data flow
Parallel container
Execute command
E-mail
v Create a deployment package for a data warehouse application (deployment
preparation)
v Deploy, run, and manage an application in the WebSphere Application Server
environment by using the Administration Console

Time required
This module should take approximately 60 minutes to complete.

Prerequisites
Complete Module 2: Designing applications to build a data warehouse on page
20.

Lesson 1: Designing the control flow for the data mart


In this lesson you build a control flow that consists of four data flows that load
tables in the MARTS schema, with the first three flows running in parallel. The
control flow also includes logic for running DB2 SQL scripts.
In the first warehouse building module, you learned how to create data flows and
run them individually from the Design Studio. In practice, you might want to
schedule some data flows to run concurrently while others can start only when
previous loads have finished. You might also want to add some processing rules to
ensure that the success or failure of an activity or process is handled appropriately.
A control flow consists of a set of operators that defines this logic, including the
ability to specify activities that can be run in parallel. In this case, the first three
data mart tables (the dimension tables) can be loaded in parallel, but the fourth
table (the fact table) cannot be loaded until the first three loads are complete. At
the end of the flow, an execute command operator runs a DB2 SQL script to verify
the results of the final load into the fact table.
The following figure shows how the control flow should look when it is complete.

36

Figure 13. The completed control flow

To design a control flow that loads the data mart in parallel:


1. Select the SQWSamplePartial data warehouse project that you used to work
with the MARTS data flows, create a new control flow, and name it
marts-flow.
2. Drag a command operator to the canvas and define the following properties
on the General page of the Properties view:

3.
4.

5.

6.
7.

a. In the Command type list, select DB2 SQL Script.


b. Set the SQL script location field to the sample_datadir variable, then
append /emptyMartsTables.sql to the variable string:
${datadirs/sample_datadir}/emptyMartsTables.sql
c. Set the DB2 Connection field to DWESAMP.
d. On the Diagnostics page, accept the default levels for logging and tracing.
Connect the start operator to the command operator. Use the blue arrow
connectors.
Drag a parallel container operator to the canvas and define its properties:
a. Drag three data flow operators into the parallel container.
b. Using the On Success link, connect the command operator to the parallel
container.
c. In the General page of the Properties view for the parallel container, set
the logging option to Use a separate log file for each activity and make
sure the execution option is set to Parallel.
Define the three data flow operators that you dragged into the parallel
container.
a. In turn, select each data flow operator inside the parallel container to
display the Properties view for that operator.
b. Use the General page to select the three MARTS dimension table data
flows in these operators:
v marts-time
v marts-product
v marts-store
Drag a fourth data flow operator to the canvas and define it as marts-fact.
Connect the On Success link on the right side of the parallel container to the
input port of the marts-fact data flow operator.
DB2 Warehouse Tutorial, Version 9.5

37

8. Drag an email operator to the canvas and connect the On Failure link of the
parallel container to the email operator.
9. Define the properties of the email operator:
a. Using fixed values, type your own e-mail address for both the sender and
recipient. The default for these fields is Use Variable, so start by changing
the fields to Use Fixed Value.
b. In the Subject field, type One of the dimension table data flows failed.
10. Drag a second email operator to the canvas and connect the On Failure link
of the marts-fact data flow operator to the email operator.
11. Define the properties of the email operator:
a. Using fixed values, type your own e-mail address for both the sender and
recipient.
b. In the Subject field, type The fact table data flow failed.
12. Define a second command operator.
a. Drag a command operator next to the marts-fact data flow operator and
use the On Success link to connect the operators.
b. On the General page of the properties view, set the Command Type value
to DB2 SQL Script.
c. Set the SQL script location field to the sample_datadir variable, then
append /countMartTables.sql to the variable string: ${datadirs/
sample_datadir}/countMartTables.sql
d. Set the DB2 Connection field to DWESAMP.
e. On the Diagnostics page, accept the default levels for logging and tracing.
13. Save and validate the control flow by selecting the control flow in the canvas
and selecting Control Flow Validate. You should not see any errors.
To test control flows before using them in a production environment, you can run
or debug them directly in the Design Studio by selecting Control Flow Execute
or Control Flow Debug. In this case, subsequent lessons explain how to run a
control flow by deploying a data warehouse application to the WebSphere
environment and using the Administration Console to start control flow processes.

Lesson checkpoint
This lesson explained the end-to-end design process for a control flow that loads a
data mart.
You learned how to connect and define the following control flow operators:
v Parallel container
v Data flow
v Command
v Email

Lesson 2: Preparing a data warehouse application for


deployment
In this lesson, you create a deployment package that contains the control flow that
you designed in the previous lesson. Deployment preparation is a task that you do
in the Design Studio and is a prerequisite to deploying an application, which you
do in the Administration Console.
Deployment preparation is required for all warehouse building tasks that you want
to run and manage in the WebSphere environment. Deployment preparation uses a

38

data warehouse application, which is based on a data warehouse project and contains
one or more control flows. After you select the control flows that you need,
generate code, and package the results in a zip file, the zip file is ready for
deployment to the WebSphere Application Server. In this lesson, you will prepare
to deploy a simple application that consists of one control flow.
In addition to selecting flows, you need to define the resources and variables that
the application will use. These attributes represent the application profile. The
deployment preparation wizard has three sections: (1) create and save the profile,
(2) proceed with code generation, and (3) generate the final deployment package (a
zip file).
To prepare a data warehouse application for deployment:
1. Right-click SQWSamplePartial in the Data Project Explorer and select New
Data Warehouse Application. The Data Warehouse Application Deployment
Preparation wizard opens.
2. In the Project Selection page, select SQWSamplePartial and click Next.
3. Define the application profile, then generate the code and deployment package.
a. Type the profile name marts_load_profile, then click Next.
b. Move the marts_flow control flow to the Selected Control Flows list, then
click Next.
c. Click Next until you reach the Code Generation page. You can ignore the
intermediate pages. Optionally, browse the contents of the generated-code
folder on the Code Generation page.
d. Click Next to go to the Package Generation page, then specify a local
directory where you want to save the deployment zip file, such as C:\temp
on Windows platforms or /tmp on Linux platforms.
e. Click Finish to generate the package and complete the wizard.
4. Verify that the deployment zip file was created by checking the directory that
you specified.

Lesson checkpoint
This lesson showed how to prepare a data warehouse application for deployment.
You learned how to:
v Define an application profile
v Generate the code for an application
v Generate the deployment package for an application (a zip file that you can
deploy to the WebSphere environment)

Lesson 3: Deploying the application that loads the MARTS


tables
In this lesson, you deploy the data warehouse application that defines the load
processes for the tables in the data mart. Application deployment is a task that you
do in the DB2 Warehouse Administration Console.
This lesson introduces the DB2 Warehouse Administration Console. You use the
console to deploy and manage the data warehouse applications that you prepared
for deployment in the Design Studio. Deployment is a kind of installation process,
in which you take the zip file from the deployment preparation process and install

DB2 Warehouse Tutorial, Version 9.5

39

the contents of the file on the computer where the WebSphere Application Server is
running. Deployed applications are visible and executable from the Administration
Console.
Before you deploy an application, you must define data sources that are referenced
as source and target databases inside data flows. In this lesson, you need to define
the DWESAMP database as a data source.
If global security is configured on your application server, you can do certain
console tasks only if you are logged in as a user with the appropriate role-based
privileges. The console supports three different roles: administrator, manager, and
operator.
To deploy the control flow application that loads the MARTS tables:
1. Ensure that the WebSphere Application Server software is running on the
application server computer. To start the server:
Windows

Click Start Programs IBM WebSphere Application Server V6


Profiles dwe Start the server.
Linux

2.

3.

4.
5.
6.

40

a. Source the appropriate db2 profile. For example:


~db2inst1/sqllib/db2profile
b. Start the server as root: /opt/IBM/dwe/appServer/bin/./
startServer.sh server1
Open a browser and start the Administration Console by going to:
http://myappsvr:portnum/ibm/console/, where myappsvr is the host name or IP
address of the computer where WebSphere Application Server is installed and
portnum is the port number that is assigned to the WebSphere profile. The
default port number for a clean installation of WebSphere Application Server is
9060. If this port number does not work, check the value of the WC_adminhost
entry in the following file: %WAS_ROOT%\profiles\dwe\properties\
portdef.props
Log into the console. If global security is configured for your application server,
log in as a DB2 Warehouse user with the administrator role. The administrator
role is required for some of the steps in this lesson. If global security is not
configured, you do not need a role-based login account. In both cases, the
Welcome page is displayed in the default browser on your computer.
In the View list, select DB2 Warehouse and expand the navigation tree. The
tree now shows only the DB2 Warehouse-related functions.
Select Common Resources Manage Data Sources and click the Create
button.
Define the DWESAMP database resource.
a. In the Database Display Name field, enter dwesamp.
b. Clear the Managed by WAS check box and click Next.
c. In the JNDI Name field, enter jdbc/dwesamp.
d. In the Database Name and DB Alias fields, enter dwesamp.
e. In the Host Name field, enter the IP address or full name of the computer
where the data source is located. If the computer is the same as the server
where WebSphere Application Server is running, enter localhost.
f. In the Port Number field, accept the default value of 50000.
g. Click Next.

h. Enter a valid user ID and password.


i. In the Access Type list, select Public.
j. Click the Test Connection button. Assuming that the connection is
successful, click Finish. The Manage Database Resources page is displayed
and shows the new data source in the list.
7. Deploy the new application:
a. Select SQL Warehousing Warehouse Applications Deploy Warehouse
Applications.
b. Specify the location of the zip file that you created during deployment
preparation in the Design Studio. You can either browse to the file or type
its full path. Then click Next.
c. Review the summary information for your application and click Next.
d. On the General page, enter the following application home, log, and
working directories. The console creates these directories if they do not
exist; you do not need to create them manually.
Windows

v Application Home Directory: C:\DWEapps\appHome


v Log Directory: C:\DWEapps\appLogs
v Working Directory: C:\temp
Linux

e.
f.

g.
h.

i.

v Application Home Directory: /home/db2inst1/DWEapps/appHome


v Log Directory: /home/db2inst1/DWEapps/appLogs
v Working Directory: /tmp
Accept the default values for the other fields on the General page and click
Next.
On the Data Sources page, which automatically displays the DWESAMP
data source (jdbc/dwesamp in the Runtime JNDI Name field), click Next
again.
No system resources are required for this application, so click Next.
If necessary, change the current value of the ${datadirs/sampledir} variable
to point to the server location of the emptyMartsTables.sql and
countMartTables.sql scripts. You copied these scripts from the client to the
server when you completed the setup instructions for the tutorial. See
Introduction to the DB2 Warehouse Tutorial on page 1.
Click Finish.

The new application is deployed and displayed in the list of applications on the
Manage Warehouse Applications page.
8. Click the underlined application name to display the properties for the
deployed application.

Lesson checkpoint
This lesson showed how to deploy a data warehouse application.
You learned how to:
v Start the DB2 Warehouse Administration Console and navigate to the Common
and SQL Warehousing pages.
v Create a data source that is required by an application.
v Deploy an application that is based on a deployment package that you created
in the previous lesson.
DB2 Warehouse Tutorial, Version 9.5

41

v View the application and its properties.

Lesson 4: Running and monitoring a process in a data


warehouse application
In this lesson, you use the DB2 Warehouse Administration Console to schedule the
processes that make up a data warehouse application. You also monitor the
statistics and logs that are generated at run time.
A data warehouse application is a wrapper for a set of processes and activities. In
the WebSphere runtime environment, processes are equivalent to control flows, and
activities are equivalent to data flows and other control flow operations. You
cannot run or schedule an entire application; you need to run its processes
individually. You can run processes in two ways: by starting them (to run now) or
by scheduling them (to run later, either once or at intervals). Running a process
implies running all of its activities; you cannot start or schedule activities to run
independently. Activities that were placed inside a parallel container in a control
flow run in parallel when you schedule the process for that control flow.
To run and monitor a process:
1. From the Administration Console, select SQL Warehousing Processes Run
Processes . A list of processes is displayed. Each process is equivalent to a
control flow.
2. Select the marts_flow process.
3. Click the Schedule button to specify a start date and time when the process
will run.
4. On the Create Schedule page, specify MARTS_SCHED as the schedule name.
5. In steps 1 and 2 of the wizard, schedule the process to repeat daily for seven
days. Schedule the first instance of the process to start a few minutes later than
the current time.
6. In step 3 of the wizard, click the Create button to create a process profile. A
process profile defines the values that will be used for variables when
scheduled processes run.
a. In the Create Process Profile page, specify marts_profile as the new profile
name, select the application that you deployed in the previous lesson, and
click Next.
b. Click Next again to go to the Instance Variables page of the wizard. Inspect
the current values of the variables and click Finish without making any
changes. The Create Schedule page opens again, with the new profile
available for selection.
7. Select the process profile that you just created and click Finish. A list of
scheduled processes is displayed.
8. After waiting for a few minutes, check the results of your first scheduled run.
a. Select SQL Warehousing History and Statistics View Statistics for
Process Instances.
b. Click the name of the process instance that you scheduled. The properties of
the scheduled run are displayed, including its start time, finish time, and
elapsed time.
9. Check the log files for the marts_flow process that you ran.
a. Select Common Logs and Traces View SQW Process Logs. A list of
processes and associated HTML log files is displayed.

42

b. Click the underlined log file for the marts_flow process. The log file
includes a history of the runtime output for the process.

Lesson checkpoint
This lesson explained how to run and monitor a process in a data warehouse
application.
You learned how to:
v View the processes and activities that make up an application
v Schedule a process to run on a fixed schedule
v View the statistics and log entries for the first run of the scheduled process

Module 4: Designing OLAP metadata


In this module, you create OLAP metadata that describes your data in a
multidimensional model. The OLAP metadata can also be used to create
recommendations for materialized query tables (MQTs), which contain
preaggregated data. In DB2 Warehouse, the MQTs provide improved performance
of queries that are based on the same cube model.
DB2 Warehouse stores information about your relational data in metadata objects
that provide a new perspective of your data. Some metadata objects act as a base
to directly access relational data. Other metadata objects describe relationships
between the base metadata objects. All of the metadata objects can be grouped by
their relationships to each other into a metadata object called a cube model.
Essentially, a cube model represents a particular grouping and configuration of
relational tables.
A cube, which is derived from a cube model, precisely defines an OLAP cube and
contains a subset of metadata objects that are based on the metadata objects in the
cube model. Cubes give the JK Superstore the ability to organize the data for quick
retrieval and analysis. You can slice a cube to get sales for all the years, one year,
one quarter, or one month, for example. MQTs give the JK Superstore the ability to
store aggregated data instead of computing these CPU-intensive groups at runtime.
This module contains the following lessons:
v Creating a complete cube model
v Adding a hierarchy to the Time dimension
v Creating a cube
v Deploying your OLAP metadata to the DWESAMP sample database
v Creating MQT recommendations using the Optimization Advisor wizard
v Deploying your recommended MQTs
v Adding the cube to the cube server

Learning objectives
After completing the lessons in this module you will know how to perform the
following tasks:
v Import OLAP metadata
v Add a hierarchy to a dimension
v Create a cube
v Deploy OLAP metadata to a database
DB2 Warehouse Tutorial, Version 9.5

43

v Create MQT recommendations


v Deploy MQT recommendations to a database
v Add the cube to the cube server

Time required
This module should take approximately 100 minutes to complete.

Optional: Start the tutorial here

You can skip the previous modules and start the tutorial here by completing a few
short steps. If you already completed the previous modules, do not complete these
steps; go to Lesson 1.
If you did not complete the earlier modules and want to start here, you need to
complete these steps.
To start the tutorial here:
1. Create the appropriate version of the DWESAMP database by opening a DB2
Command Window and running the following script:
Windows

C:\Program Files\IBM\dwe\samples\data\setupolapandmining.bat
Linux

/opt/IBM/dwe/samples/data/setupolapandmining.sh
For information about what the script does, see the Readme.txt or
readme_linux.txt file in the data directory. For general tutorial setup
information, see:
v Running the tutorial in a Windows client-server environment on page 7
v Running the tutorial in a Linux client-server environment on page 8
v Running the tutorial in a mixed client-server environment on page 8
2. Create a data design project called Tutorial - Data Model.
a. In the Design Studio, click File New Project.
b. In the New Project wizard, expand the Data Warehousing folder, select
Data Design Project (OLAP), and click Next.
c. In the New Data Design Project wizard, type Tutorial Data Model for the
project name, and click Finish.
The Design Studio displays your Tutorial - Data Model project icon in the Data
Project Explorer.
3. Create a physical data model with reverse engineering.
a. Use the New Physical Data Model wizard to create the physical data model.
Right-click the Data Models folder in the Data Project Explorer and click
New Physical Data Model.
b. On the Model File page, specify the following selections:
1) Change the File name field to DWESampleTutorial.
2) Check that the Version field is set to V9.5.
3) Select Create from reverse engineering and click Next.

44

c. On the Source page, check that the Database option is selected and click
Next.
d. On the Select Connection page, select Use an existing connection and select
DWESAMP from the Existing connections list. Click Next.
e. On the User Information page, type your database username and password.
Click Next.
f. On the Schema page, select the check boxes for the DWH and MARTS
schemas. Click Next.
g. On the Database Elements page, click Next.
h. On the Options page, click Finish.
The DWH and MARTS schemas are now in the physical data model in your
data design project.

Lesson checkpoint
You completed the prerequisite steps for starting the tutorial here. You can now
continue with Lesson 1.

Lesson 1: Creating a complete cube model


In this lesson you build a cube model for OLAP analysis that describes
relationships in the relational data. With the cube model, you can to define the
aspects of the data that are important for the analytic needs of JK Superstore.
Cube models are usually built to represent a relational star schema or snowflake
schema, but you can also model and optimize additional schemas such as virtual
marts and degenerate dimensions. You can use the Design Studio to create your
OLAP metadata by creating the metadata, importing the metadata, or a
combination of both.
A star schema has a fact table at the center and one or more dimension tables that
are joined to the fact table. A snowflake schema is an extension of a star schema
such that one or more dimensions are defined by multiple tables. A cube model
that is based on a simple star schema is built around a central facts object. The
facts object contains a set of measures that describe how to aggregate data from the
fact table across dimensions. Figure 14 on page 46 shows how measures and a facts
object relate to relational data.

DB2 Warehouse Tutorial, Version 9.5

45

Figure 14. Facts object. How a facts object and measures relate to relational data

Dimensions are connected to the facts object in a cube model like the dimension
tables are connected to the fact table in a star schema. Columns of data from
relational tables are represented by attributes that are organized to make a
dimension.
Figure 15 on page 47 shows how dimensions are built from relational tables.
Hierarchies store information about how the levels within a dimension are related
to each other and are structured. A hierarchy provides a way to calculate and
navigate across the dimension. Each dimension has a corresponding hierarchy that
contains levels that correspond to one or more columns within a table. In a cube
model, each dimension can have multiple hierarchies.

46

Figure 15. Dimension. How dimensions are built from relational tables

All of the dimensions are connected to a facts object in a cube model that is based
on a star schema or snowflake schema. Joins can connect tables to create a facts
object or a dimension. In a cube model, joins can connect facts objects to
dimensions. The dimensions reference their corresponding hierarchies, levels,
attributes, and related joins. Facts objects reference their measures, attributes, and
related joins. Figure 16 on page 48 shows how the metadata objects are related to
each other in a cube model and map to a relational snowflake schema.

DB2 Warehouse Tutorial, Version 9.5

47

Figure 16. Cube model. How metadata objects fit together and map to a relational snowflake
schema

To create a cube model that is based on your relational schema, you can use the
Quick Start wizard, which creates the metadata objects that the wizard can
logically infer from the schema. You specify the fact table, and the wizard detects
the corresponding dimensions, joins, and attributes. After you complete the Quick
Start wizard, you need to add calculated measures, hierarchies, and levels to the
cube model so that the cube model is complete and can be optimized.
You can also import existing metadata. You might have existing metadata that was
previously created using the Design Studio, DB2 Cube Views, or another OLAP
tool that provides a bridge to the OLAP metadata in DB2 Warehouse (or DB2 Cube
Views at the Version 8.1, FixPak 10 level).
JK Superstore has metadata already defined in another OLAP tool that you can
import into DB2 Warehouse and optimize.
To import a cube model and its corresponding metadata:
1. Set the preferences for the metadata database connection:
a. Click Window Preferences. The Preferences dialog opens.
b. In the Preferences dialog, click Data Warehousing Repository
c. Specify the following settings:

48

v Enter DWECTRL in the Database name field.


v In the Host field, enter the IP address or full name of the computer where
the data source is located. If the computer is the same as the one where
the server components are installed, accept the default value of localhost.
v In the Port number field, accept the default value of 50000.
d. Click Apply.
e. When prompted, type your user ID and password and click OK.
2. In the Data Project Explorer, expand the data design project called Tutorial
Data Model that you created in Module 1: Designing the physical data model
for your data warehouse on page 13.
3. Import the OLAP metadata from an XML file that was already exported from
the OLAP tool by using a bridge to DB2 Warehouse.
a. Click File Import to open the Import wizard.
b. On the Import page, select Data Warehousing OLAP Metadata and click
Next.
c. On the Import file and target page, browse to select the following XML file:
Windows

C:\Program Files\IBM\dwe\samples\OLAP\partialSample.xml
Linux

/opt/IBM/dwe/samples/OLAP/partialSample.xml
d. In the Import into list, select the DWESAMP (Tutorial Data
Model/DWESampleTutorial.dbm) project database. Click Next.
e. On the Import OLAP Objects page, make sure that Replace existing objects
is selected, and click Finish.
4. Browse the OLAP metadata that you imported into your project.
a. In the Data Project Explorer tree, expand Tutorial Data Model Data
Models DWESampleTutorial.dbm DWESAMP MARTS OLAP
Objects Cube Models to view the Purchase Profile Analysis cube model
and related metadata that you imported.
b. Expand the Purchase Profile Analysis cube model to view the Purchase
Profile Analysis facts object and the Product, Store, and Time dimensions.
c. Select the Purchase Profile Analysis facts object and view the Measures page
of the Properties view. The measures in the following table were imported
for your JK Superstore analytical needs.
Table 4. Measures that you imported
Measure name

Measure description

Average Item Price Sold

The average price of the product items that


were purchased by customers

Average Product Book Price

The average book price of the product items


that were purchased by customers

Average Profit Amount Per Item

The average amount of profit made on the


product items that were purchased by
customers

Cost of Goods Sold (COGS)

The total monetary value of all costs to JK


Superstore associated with products

Number of Items

The count of individual product items.

Product Book Price Amount

The price set by JK Superstore as the


expected sales price for the products.
DB2 Warehouse Tutorial, Version 9.5

49

Table 4. Measures that you imported (continued)


Measure name

Measure description

Profit Amount

The excess of income over expenses

Profit Margin Percentage

The amount, expressed as a percentage, of


profit made per each unit of currency sold

Sales Amount

The monetary value of purchases made by


customers.

d. In the Data Project Explorer view, expand the Time dimension information
folder. You can see the following two objects:
PRCHS_PRFL_ANLYSIS-TIME
The dimension-to-facts join
Time

The dimension link to the Time dimension

Dimensions can be shared across cube models, so dimensions are shown in


the Shared Dimensions folder under the OLAP Objects folder. When a
cube model uses the dimension, a dimension link is shown in the
dimension information folder that points to the shared dimension.
e. Expand the Time dimension link and then expand the Levels folder and the
Hierarchies folder. You can see that one hierarchy, the Fiscal Year Hierarchy,
is defined.
Additional levels are defined in the Time dimension to describe the
calendar year in addition to the fiscal year. Calendar year information is an
important definition for many business processes but is missing from the
current OLAP metadata.
5. Click File Save All to save the OLAP metadata updates in your project.
In the next lesson, you will create a second hierarchy for the Time dimension to
describe the calendar year that will be used in JK Superstore reports.

Lesson checkpoint
In this lesson, you imported metadata from an XML file.
You learned how to:
v Import your OLAP metadata
v Navigate your OLAP metadata in the Data Project Explorer

Lesson 2: Adding a hierarchy to the Time dimension


In this lesson, you add the Calendar Year hierarchy to the set of OLAP metadata
that you imported in Lesson 1.
To add the Calendar Year hierarchy to the Time dimension:
1. In the Data Project Explorer, expand the Time dimension link under the Time
dimension information folder. Right-click the Hierarchies folder and click Add
Hierarchy. Ensure that you can see the Properties view. If you cannot see the
Properties view, click Window Show View Properties.
2. On the General page, specify Calendar Year Hierarchy as the name for the
hierarchy.
3. On the Levels page, specify the levels to include in the hierarchy.

50

a. To select levels, click the add level icon (


) in the toolbar above the
Levels table.
b. Select the following levels and use the move up and move down icons in
the toolbar to arrange the levels in the correct order. Then click OK.
Calendar Year Level
Calendar Year-Quarter Level
Calendar Year-Quarter-Month Level
Day of Calendar Month Level
c. Select the All level check box and type All Time (Calendar) as the name
for the all level. The all level is an optional level that exists at the top of the
hierarchy and has one member that represents the aggregation of the
members of lower levels.
4. On the Type/Deployment page, ensure that Balanced / Standard is selected.
5. Click File Save All to save the OLAP metadata that you defined.

Lesson checkpoint
In this lesson, you updated the Time dimension by adding a second hierarchy that
defines time in terms of the calendar year. The Calendar Year Hierarchy will be
important for JK Superstores reports.
You learned about dimensions, hierarchies, and levels, and you learned how to
create hierarchies and levels.

Lesson 3: Creating a cube


In this lesson, you create a cube called Price Analysis that includes the metrics and
dimensions that are needed by the JK Superstore business analysts. Cubes are a
critical component of your OLAP metadata and are used by reporting applications,
like Alphablox.
You can reuse the components of a cube model to create more precise cubes for
specific applications. A cube is the most precise metadata object and is the closest
object to an OLAP conceptual cube. A cube is a specific instance or subset of a
cube model. A cube has a specific set of similar but more restrictive metadata
objects that are derived from the parent cube model, including cube dimensions,
cube hierarchies, cube levels, and a cube facts object. A cube can have only one
cube hierarchy defined for each cube dimension, but a dimension can have many
hierarchies that are defined for the cube model. Because of this structural
difference between a cube and a cube model, you can retrieve most cubes with a
single SQL statement.
To create the Price Analysis cube:
1. In the Data Project Explorer, expand the Cubes folder under the Purchase
Profile Analysis cube model for the MARTS schema. Right-click the Cubes
folder and click Add Cube.
2. Specify Price Analysis as the name of the cube on the General page of the
Properties view and press the Tab key. The Label field is also updated with the
name, Price Analysis.
3. Add cube dimensions to the cube.
a. On the Dimensions page of the Properties view, click the Add cube
dimensions icon (
) in the toolbar to select dimensions from the cube
model to include in the cube as cube dimensions.
DB2 Warehouse Tutorial, Version 9.5

51

b. In the Available Dimensions window, click Select All and click OK.
c. On the General page of the Cube Dimensions, type the label for the
dimension in the Label field. The value for the label is the name value
without the phrase (Price Analysis). For example, if the name value is
Product (Price Analysis), then the label value is Product. The Alphablox
report will fail if the label value is incorrect.
You can expand the Price Analysis cube in the Cubes folder to see the cube
facts and the cube dimensions.
4. Specify a cube hierarchy for the cube dimension.
a. On the Cube Hierarchy page of the Properties view, click the
push button to specify the cube hierarchy.
b. Select the Calendar Year Hierarchy for the Time cube dimension.
5. Ensure that all of the levels in each cube hierarchy are included by opening the
Levels page of the Properties view for the cube hierarchy. To include a level,
select the check box for the level.
6. Add the measures that you want to include to the cube facts object:
a. Select the cube facts object and open the Measures page of the Properties
view.
b. Click the Add Measure icon (
) in the toolbar to select measures from
the cube models facts object to include in the cube facts object. Select the
following measures:
v Number Of Items
v Product Book Price Amount
v Sales Amount
v
v
v
v
v

Average Item Price Sold


Average Product Book Price
Profit Amount
Average Profit Amount Per Item
Profit Margin Percentage

Click OK to add the measures to the cube facts object.


c. Specify Sales Amount as the default measure by clicking the check box in
the Default column for the Sales Amount measure.
7. Click File Save All to save the OLAP metadata that you defined.
You have a complete cube that is ready to be used by Alphablox to develop
multidimensional analytical applications.

Lesson checkpoint
In this lesson, you created the Price Analysis cube that can be used for Alphablox
analytics applications for JK Superstore.
You learned about cubes and you learned how to:
v Create a cube
v Add cube dimensions to a cube including cube hierarchies and cube levels
v Add measures to a cube facts object

52

Lesson 4: Deploying your OLAP metadata to the DWESAMP


sample database
In this lesson you validate the model, and then deploy your metadata to a
database.
Prerequisite: To deploy OLAP metadata to a database, you must be connected to
the target database.
To deploy your metadata from your project file to your database:
1. Validate the cube model to ensure that all of your OLAP metadata objects are
defined correctly and can be optimized. You can validate the Purchase Profile
Analysis cube model in the same way that you validated the MARTS schema in
Lesson 4: Validating your physical data model on page 16.
2. To deploy your OLAP metadata to the DWESAMP database, right-click the
Purchase Profile Analysis cube model in the Data Project Explorer, select
Deploy to database and click OK. The Deploy OLAP Objects wizard opens.
3. On the Specify the Target Database page, select the DWESAMP database, and
then click Finish.
Your OLAP metadata now exists in the DWESAMP database. You can now use the
Optimization Advisor wizard to create MQT recommendations.

Lesson checkpoint
In this lesson, you successfully validated and deployed your OLAP metadata into
the DWESAMP database.
You learned how to use the Deploy OLAP Objects wizard to deploy your OLAP
metadata to the sample database.

Lesson 5: Creating MQT recommendations using the


Optimization Advisor wizard
In this lesson you use the Optimization Advisor wizard to create SQL scripts that
can build a set of recommended materialized query tables (MQTs) for a cube
model. MQTs aggregate commonly accessed data to speed up query performance
for applications that issue SQL-based or MDX-based OLAP-style queries, such as
Alphablox.
MQTs are also known as summary tables. In this module, the terms MQT and
summary table are used interchangeably.
You can complete expensive calculations (aggregations) and joins that might be
needed to answer a query and store that data in an MQT. When you run queries
that can use the precomputed data, the DB2 optimizer will reroute the queries to
the MQT. A query does not need to match the precomputed calculations exactly. If
you use simple analytics like SUM and COUNT, the DB2 optimizer can
dynamically aggregate the results from the precomputed data. Many different
queries can be satisfied by one MQT. Using MQTs can dramatically improve query
performance for queries that access commonly used data or that involve
aggregated data over one or more dimensions or tables.
You expect that the new reports that you are developing will be in high demand
by the JK Superstore business analysts, so you want to do everything that you can

DB2 Warehouse Tutorial, Version 9.5

53

to ensure optimal performance. You do not yet have a workload, but you can
create MQTs based on the OLAP model that you completed in the previous
lessons.
To optimize the cube model, use the Optimization Advisor wizard in the Design
Studio:
1. In the Database Explorer, expand the tree view for the DWESAMP database by
clicking Schemas MARTS OLAP Objects Cube Models. In the Cube
Models folder, right-click the Purchase Profile Analysis cube model and click
Optimization Advisor. The cube model is validated against the metadata rules.
If any part of the cube model is not valid, you will receive an error and you
will need to modify the cube model so that it is valid before you can optimize
it.
2. On the Target Queries page of the Optimization Advisor wizard, specify the
target of the queries that you want to optimize the Price Analysis cube for. The
target of queries is used to improve the optimization results.
a. Click the
push button to open the Optimization Slices window.
b. On the Optimization Slices window, create an optimization slice for the
Price Analysis cube.
1) Click the Add slice icon (
) in the toolbar to add a new optimization
slice to the cube.
2) For the new slice that appears in the table, specify a level for each
hierarchy. Click the level in the cube dimension column and select the
appropriate level from the list:
Time cube dimension
Select the Calendar Year level.
Product cube dimension
Select any level.
Store cube dimension
Select any level.
3) Click OK.
c. Click OK and then click Next.
3. On the Summary Tables page, specify that you want deferred update summary
tables. Specify the table space in which to store the summary tables and
summary table indexes and click Next. You can accept the default
USERSPACE1 table space settings.
4. On the Limitations page, select Do not specify a disk space limit and Do not
specify a time limit to provide unlimited time and disk space for the
Optimization Advisor. Specify that you want to allow data sampling. The more
space, information, and time that you specify, the more significantly your
performance results will improve. Click Start Advisor to start the Optimization
Advisor. After the Optimization Advisor completes its recommendations, click
Next.
Note: The Optimization Advisor might take several minutes to create its
recommendations.
5. On the SQL Scripts page, type unique file names into the following fields:
Windows

SQL script to create the summary tables: C:\dwetutorial\olap\


createmqts.sql

54

SQL script to refresh the summary tables: C:\dwetutorial\olap\


refreshmqts.sql
Note: On Windows platforms, you need to create the
c:\dwetutorial\olap directory manually.
Linux

SQL script to create the summary tables: /tmp/dwetutorial/olap/


createmqts.sql
SQL script to refresh the summary tables: /tmp/dwetutorial/olap/
refreshmqts.sql
Note: On Linux platforms, you need to create the /tmp/dwetutorial/
olap directory manually, as user db2inst1.
6. Click Finish to save the recommended SQL scripts with the file names that you
specified.

Lesson checkpoint
In this lesson, you used the Optimization Advisor wizard in the Design Studio to
create summary table recommendations. The recommended summary tables can
dramatically improve the performance of OLAP-style queries from your Alphablox
analytics applications and other applications that exchange OLAP metadata with
DB2 Warehouse.
You learned about summary tables, and you learned how to create summary table
recommendations.

Lesson 6: Deploying your recommended materialized query


tables (MQTs)
In this lesson, you use the DB2 Warehouse Administration Console to run the SQL
script to create the recommended MQTs that you designed in the Design Studio.
You can use the Administration Console to complete the following tasks:
v Connect to a database
v
v
v
v
v

Import and export OLAP metadata for your database


View (but not modify) the OLAP metadata in the database
Run the Optimization Advisor wizard to create MQT recommendations
Deploy the recommended MQTs to the database
Visually review the usage history of the MQTs

In this lesson, you will learn how to use the Administration Console to connect to
a database that is a new data source for WebSphere Application Server. You will
also learn how to verify that the database is enabled for OLAP and deploy your
recommended MQTs.
If global security is configured on your application server, you can do certain
console tasks only if you login as a user with the appropriate role-based privileges.
The console supports three different roles: administrator, manager, and operator.
To deploy your recommended summary tables, use the Administration Console:
1. Ensure that the WebSphere Application Server software is running on the
application server computer. To start the server:

DB2 Warehouse Tutorial, Version 9.5

55

Windows

Click Start Programs IBM WebSphere Application Server V6


Profiles dwe Start the server.
Linux

2.

3.

4.
5.
6.

7.

a. Source the appropriate db2 profile. For example:


~db2inst1/sqllib/db2profile
b. Start the server as root: /opt/IBM/dwe/appServer/bin/./
startServer.sh server1
Open a browser and start the Administration Console by going to:
http://myappsvr:portnum/ibm/console/, where myappsvr is the host name or IP
address of the computer where WebSphere Application Server is installed and
portnum is the port number that is assigned to the WebSphere profile. The
default port number for a clean installation of WebSphere Application Server is
9060. If this port number does not work, check the value of the WC_adminhost
entry in the following file: %WAS_ROOT%\profiles\dwe\properties\
portdef.props
Log into the console. If global security is configured for your application server,
log in as a DB2 Warehouse user with the administrator role. The administrator
role is required for some of the steps in this lesson. If global security is not
configured, you do not need a role-based login account. In both cases, the
Welcome page is displayed in the default browser on your computer.
In the View list, select DB2 Warehouse and expand the navigation tree. The
tree now shows only the DB2 Warehouse-related functions.
Select DB2 Warehouse Common Resources Manage Data Sources.
Select the DWESAMP database and click the Test Connection button. If the test
connection is successful, you are ready to deploy the recommended MQTs to
the DWESAMP database.
If the test connection failed or you started the tutorial from Module 4, go to
Step 7 to define the DWESAMP database resource.
Optional: If you are starting the tutorial from Module 4 or the test connection
failed in Step 6, complete this step to define the DWESAMP database resource.
a. In the Database Display Name field, enter dwesamp.
Clear the Managed by WAS check box and click Next.
In the JNDI Name field, enter jdbc/dwesamp.
In the Database Name and DB Alias fields, enter dwesamp.
In the Host Name field, enter the IP address or full name of the computer
where the data source is located. If the computer is the same as the server
where WebSphere Application Server is running, enter localhost.
f. In the Port Number field, accept the default value of 50000.
g. Click Next.
h. Enter a valid user ID and password.
b.
c.
d.
e.

i. In the Access Type list, select Public.


j. Click the Test Connection button. Assuming that the connection is
successful, click Finish. The Manage Database Resources page is displayed,
including the new data source in the list.
8. Map the cube model to the database.
a. From the Administration Console, select DB2 Warehouse Cubing Services
Manage OLAP Metadata.
b. Click the Purchase Profile Analysis cube model link.

56

c. Click the Database mapping tab.


d. Select jdbc/DWESAMP from the Database resource list.
e. Click Save.
9. Run the script to deploy the MQTs to the DWESAMP database.
a. Click the check box next to the Purchase Profile Analysis cube model and
click Run SQL Script.
b. On the Run SQL Script page, browse to the SQL script that will create the
recommended summary tables:
Windows

C:\dwetutorial\olap\createmqts.sql
Linux

/tmp/dwetutorial/olap/createmqts.sql
c. Click Run Script. The SQL Script Results page displays.
d. On the SQL Script Results page, download the execution log for the
summary table scripts. Open the execution log to ensure that the summary
tables were created successfully.
e. Click Finish.

Lesson checkpoint
In this lesson, you created the recommended summary tables in the DWESAMP
database. The recommended summary tables contain pre-aggregated tables that
help business reports run more quickly and can be used by OLAP-style queries
that are issued by users who are using Alphablox multidimensional reports.
You learned how to:
v Test the database connection
v Deploy recommended MQTs to a database

Lesson 7: Adding the cube to the cube server


In this lesson, you can add the cube to the cube server to run business analysis on
your data.
In this lesson, you will learn how to use the Administration Console to start the
cube server and add the Price Analysis cube to the cube server.
To start the cube server and add the Price Analysis cube:
1. Start the DWEREPOS cube server:
a. From the Administration Console, select DB2 Warehouse Cubing Services
Manage Cube Servers.
b. Click Create and enter the following information:
v In the Cube server name field, enter DWEREPOS.
v In the Port number field, enter an available port that also has the next
two consecutive ports available.
Important: The cube server uses three consecutive available ports, such
as 9080, 9081, 9082. Port numbers that are assigned to different cube
servers must not overlap.
c. Click Save.
d. Select the check box to the left of the DWEREPOS cube server and click the
Start button.
DB2 Warehouse Tutorial, Version 9.5

57

2. Add the Price Analysis cube to the DWEREPOS cube server:


a. Select the DWEREPOS link from the list of cube server names. The
DWEREPOS - Properties page opens.
b. Click the Cubes tab.
c. Click Add.
d. Select the Price Analysis cube from the list and click Add.
e. Select the check box to the left of the Price Analysis cube and click the Start
button.
The Price Analysis cube that you created is ready for testing and developing
analytic applications.

Lesson checkpoint
In this lesson, you started the cube server, added a cube to the cube server, and
started the cube. In Module 5, you learn to use Alphablox Blox Builder to create a
reporting application that retrieves and displays data from your cube.
You learned how to:
v Start the cube server in the Administration Console
v Add a cube to the cube server
v Start the cube

Module 5: Creating Alphablox reports based on IBM cubes


Blox Builder is an Alphablox tool that can be used to create customized analytic
applications in a code free environment. In this module, you use the Alphablox
Blox Builder tool to create analytic reports to display multidimensional data from
IBM DB2 Warehouse Cubing Services. The JK Superstore will create and run
reports to analyze the sales data from its business based on the data setup in the
previous four modules.
With Blox Builder, you can create custom analytic applications in either of two
ways:
v Application developers can use the Blox Builder user interface in Design Studio
to design reports, queries, and applications. The Blox Builder user interface
provides a rich palette of reusable Blox and additional components and
interactions used in creating both simple and highly customized analytic
applications.
v Java developers can use reports created by Blox Builder in their custom J2EE
applications in the same way they currently use Alphablox Blox components.
A Blox is an Alphablox software component, using Web and Java technologies, to
build analytic applications. A Blox component contains a Blox object, such as a
DataBlox or a PresentBlox.
Blox Builder is an interface that you can use to develop, create, and design analytic
applications without writing Java code or creating JSP pages. The analytic
applications contain analytic reports that you also create in Blox Builder. Reports
can contain HTML elements, such as images and form fields, and Alphablox
components, such as the PresentBlox and the DataBlox.
This module consists of the following lessons:
v Setting up the Alphablox environment

58

v
v
v
v
v

Creating an analytic application


Building Alphablox queries with Blox Builder
Building Alphablox reports with Blox Builder
Customizing your queries and reports
Creating and deploying an application to the WebSphere server

Learning objectives
After you complete the lessons in this module, you will understand basic concepts
about Alphablox and Blox Builder and know how to:
v Start Alphablox
v Create an IBM Cubing Services Adapter DataSource
v Create a project
v Create an application
v Create a report
v Create a query
v
v
v
v
v

Customize a report
Preview your report
Add your reports to an application navigation
Preview the application
Deploy the application

Time required
This module should take approximately 90 minutes to complete.

Prerequisites
Ensure that the following prerequisites are met:
v Module 4 has been completed successfully
v You have the Blox Builder plug-ins installed in the Design Studio

Lesson 1: Setting up the Alphablox environment


In this lesson, you will set up the Alphablox environment.
You will need to connect to and start Alphablox. You will also create an IBM
Cubing Services Adapter DataSource.
1. After you have installed Alphablox, take note of the following:
v The default username is: admin, or the username you designated.
v The default password is: password, or the password you specified. Be sure to
change the password as soon as possible.
v The default port is: 9080
v The Telnet console port is: 23
2. Start Alphablox:
a. Start the application server to start up Alphablox. For WebSphere running
on Windows, you can select Start > All Programs > IBM WebSphere >
Start the Server.
b. [Apache Tomcat 5.5 with Alphablox 8.4.1] Start the application server to
start up Alphablox.
DB2 Warehouse Tutorial, Version 9.5

59

c. [Apache Tomcat 3.2.4 with Alphablox 8.4] On Windows, select Start > All
Programs > Alphablox > Startup Alphablox.
3. Log into the Alphablox home page as the admin user by entering the following
URL in a browser window: http://<hostname:portnumber>/AlphabloxAdmin/
home/ where <hostname:portnumber> represents the name of the server and port
number on which Alphablox runs.
4. Create an IBM Cubing Services Adapter DataSource:
Note: Your queries will need a DataSource defined under Alphablox to connect
to a cube.
a. Click the Administration tab and then click the Data Sources link.
b. Click the Create button.
c. From the Adapter menu, select the adapter named IBM Cubing Services
Adapter.
d. Type ACS_External in the Data Source Name text box.
e. Optional: Enter a description in the Description text box.
f. Ensure that your Cubing Services Server name points to the host name or IP
address of your IBM cube server. Your IBM cube server should be running
on the WebSphere server.
g. Ensure that you include the port number for the IBM cube server, as
discussed in the previous module. See Module 4: Lesson 7: Starting the cube
server for more information.
h. Enter your DB2 user name and password.
i. Specify a number in the Maximum Rows and the Maximum Columns text
boxes. The values limit the number of rows or columns returned for queries
entered through this data source. The default values are 1000.
j. Click the Save button to save the data source.

Lesson checkpoint
You learned how to:
v Start Alphablox
v Create an IBM Cubing Services Adapter DataSource

Lesson 2: Creating an analytic application


In this lesson, you will create an analytic application using the Blox Builder
perspective from the Design Studio.
Before you can create a report, application, or query, you must create a Blox
Builder project in the Design Studio. Your workspace can contain multiple projects.
A Blox Builder project contains top-level folders for each of the tree types of
fundamental Blox Builder objects: Queries, Reports, and Applications. A Blox
Builder project can contain multiple Blox Builder applications.
Your applications, queries, and report files are stored locally in the Blox Builder
tooling project. These files will only be available on the server when you
specifically deploy them. Once the files have been deployed, the server will have a
copy of the deployed files.
To create a project and application:
1. Create a project:
a. Click File > New > Project. The New Project window displays.

60

b. Expand Blox Builder and select Blox Builder Project. The Blox Builder
Project window displays.
c. Name your project BloxBuilderProject in the Project name field. You can
check the Use default location box to store your project in your workspace,
or uncheck the option to type or browse to a location to save your project.
Click Finish. A window displays asking if you want to change to the Blox
Builder perspective. Alphablox is the specialization of the Blox Builder
perspective. Click Yes. Your BloxBuilderProject displays in the Blox
Builder Project Explorer.
2. Create an application:
Note: Your reports will be displayed in the navigation of an application.
a. In the Blox Builder Project Explorer, expand BloxBuilderProject.
b. Right-click on the Applications folder and select New > Application. The
New Application wizard displays.
c. Type priceAnalysis for the name of your application in the Application
name field. The name of your application reflects the name of the folder
that contains your application when you export it to the Blox Builder server.
For the display name type Price Analysis. Click Finish. The new
priceAnalysis application is displayed under the Applications folder in the
Blox Builder Project Explorer.

Lesson checkpoint
You learned how to:
v Create a project
v Create an application

Lesson 3: Building Alphablox queries with Blox Builder


In this lesson, you use Blox Builder to create a query.
If you plan to use a query in more than one report, or plan to use multiple queries
within a single report, you can store your query in a query definition file and
reference the query when you want to use it. A query contains the query string,
data, and connection information such as the name of the data source and the
query. A query definition also defines the querys unique ID, display name, and
description. A query can then be accessed from any report and you can override
the properties in a query in the report.
For example, you have three reports that use the same query. Instead of typing the
query for each report, you can store the query as a query object in a separate file.
In each report, you reference the query by using the query ID.
You will create multiple queries to show sales by time periods:
1. Create the most basic query:
a. In the Blox Builder Project Explorer, expand BloxBuilderProject.
b. Right-click on the Queries folder and select New > Query. The New Query
wizard displays.
c. Name your query Basic. Click Finish. Your new query object displays
under the Queries folder in the Blox Builder Project Explorer. The Query
editor displays the query object. Query names are case sensitive.
d. Type select from [Price Analysis] in the Text Query area.

DB2 Warehouse Tutorial, Version 9.5

61

e. In the Properties view, click the Data Datasource tab and type ACS_External
for the DataSourceName.
f. Click the button that is second to the right. The Alphablox Server
Configuration window displays.
g. Provide the following:
v A name for the server configuration
v IP address of the Alphablox server or DNS name of the Alphablox server
v 9080 as the port number for Alphablox
v The username and password
Click Save to save this configuration.
h. Click Test Server Configuration. A window will display informing you if
your connection to the server was successful. Click Save Configuration.
Click Next.
i. In the Properties tab, select ACS_External as your DataSourceName, enter
your user name and password, and click Finish. You might be prompted to
log into Alphablox; after you log in, the Query Designer page opens.
j. Select the name of your cube from the drop-down list and click Run Default
Query.
k. Click Apply Query Update.
l. Click Preview Query to preview your query. The Alphablox Server
Configuration window displays. You should already have your server
configuration information filled out.
m. Click Finish to display the Query Tools page. Click Close Window to close
the Query Tools page.
n. Save your query.
2. Create a Stores By Time query:
a. In the Blox Builder Project Explorer, expand BloxBuilderProject.
b. Right-click the Queries folder and select New > Query. The New Query
wizard displays.
c. Name your query Stores By Time. Click Finish. Your new query object
displays under the Queries folder in the Blox Builder Project Explorer. The
Query editor displays the query object.
d. In the Properties view, click the Data Datasource tab and type ACS_External
for the DataSourceName.
e. Click the button that is second to the right. The Alphablox Server
Configuration window displays. You should already have your server
configuration information filled out. Click Next.
f. In the Properties tab, type ACS_External for the DataSourceName and click
Finish. You might be prompted to log into Alphablox; after you log in, the
Query Designer page opens.
g. Select Price Analysis from the Cubes drop-down list and click Run Default
Query.
h. A single cell of data returned by the default query appears in the grid.
Perform the following actions in the Grid user interface to generate the
query:
v In the DataLayout panel, drag the Store dimension to Row axis.
v Right-click on the All Stores member on the Row axis and choose
Expand All to see all descendants of stores.
v Drag the Time dimension to the Column axis.

62

v Double-click the All Time member to retrieve its children.


i. Click Apply Query Update. Your query output should look similar to the
following:
SELECT
DISTINCT( Distinct(Hierarchize(
{[Price Analysis].[Time].[All Time (Calendar)],
AddCalculatedMembers([Price Analysis].[Time].[All Time (Calendar)].children)}
)))
ON AXIS(0),
DISTINCT( Distinct(Hierarchize({[Price Analysis].[Store].[All Stores],
AddCalculatedMembers(Descendants([Price Analysis].[Store].[All Stores],
[Price Analysis].[Store].[All Stores].level,AFTER))})) )
ON AXIS(1)
FROM [Price Analysis]
WHERE
(
[Price Analysis].[Measures].[Sales Amount],
[Price Analysis].[Product].[All Products]
)

j. Click Preview Query to preview your query. The Alphablox Server


Configuration window displays. You should already have your server
configuration information filled out.
k. Click Finish to display the Query Tools page. Click Close Window to close
the Query Tools page.
l. Save your query.
3. Create a Stores By Quarter query:
a. In the Blox Builder Project Viewer, expand BloxBuilderProject.
b. Right-click on the Queries folder and select New > Query. The New Query
wizard displays.
c. Name your query Stores By Quarter. Click Finish. Your new query object
displays under the Queries folder in the Blox Builder Project Explorer. The
Query editor displays the query object.
d. In the Properties view, click the Data Datasource tab and type ACS_External
for the DataSourceName.
e. Click the button that is second to the right. The Alphablox Server
Configuration window displays. You should already have your server
configuration information filled out. Click Next.
f. In the Properties tab, type your DataSourceName, for example ACS_External
and click Finish. You might be prompted to log into Alphablox; after you
log in, the Query Designer page opens.
g. Select the name of your cube Price Analysis from the Cubes drop-down
list and click Run Default Query.
h. A single cell of data returned by the default query appears in the grid.
Perform the following actions in the Grid user interface to generate the
query:
v In the DataLayout panel, drag the Store dimension to Row axis.
v Right-click on the All Stores member on the Row axis and choose
Expand All to see all descendants of stores.
v Drag the Time dimension to the Column axis.
v Right-click the All Time member and choose Member Filter. Click
Remove All. Select a quarter under 2003 and click Add. Click OK.
i. Click Apply Query Update. Your query output should look similar to the
following:

DB2 Warehouse Tutorial, Version 9.5

63

SELECT
DISTINCT( {[Price Analysis].[Time].[All Time].[2003].[1]} ) ON AXIS(0)
, DISTINCT( Distinct(Hierarchize({[Price Analysis].[Store].[All
Store],Descendants([Price Analysis].[Store].[All
Store],[Price Analysis].[Store].[All Store].level,AFTER)})) ) ON AXIS(1)
FROM [Price Analysis]
WHERE
(
[Price Analysis].[Measures].[Product Book Price Amount],
[Price Analysis].[Product].[All Product]
)

where 1 was the quarter selected in the previous step.


j. Click Preview Query to preview your query. The Alphablox Server
Configuration window displays. You should already have your server
configuration information filled out.
k. Click Finish to display the Query Tools page. Click Close Window to close
the Query Tools page.
l. Save your query.

Lesson checkpoint
You learned how to:
v Create three separate queries: a simple basic query, a query that shows Stores By
Time, and a query that shows Stores By Quarter.

Lesson 4: Building Alphablox reports with Blox Builder


In this lesson, you use Blox Builder to create and design an Alphablox report that
displays a grid and a chart showing sales by time periods.
A report may contain multiple visual and non-visual components. A DataBlox is a
non-visual component, whereas a PresentBlox, text, and buttons are visual
components. You will create a simple report containing only a DataBlox and
PresentBlox.
Blox Builder enables report properties to be overridden, functioning like a
template. In a later lesson, you will use the report we create in this lesson to create
multiple reports in our application navigation.
A report creates an analytic view that defines:
v Data sets and queries to be used in the view
v Components in the report (a PresentBlox, a button, and a select list are all
examples of components provided by Blox Builder for use in reports)
v Interactions between components in a report. For example, your query could be
driven by a users selection in a calendar control.
v The layout of the components within the displayed page
The report is stored as a report definition in Blox Builder, (which is a series of XMI
files that Blox Builder operates on while interacting with the user interface) and a
layout file that controls where those components are displayed in the browser. A
report definition is uniquely defined by its report ID. A report ID can contain
alphanumeric, slash, dash, space, and underscore characters. The report definition
and layout files are stored in the Alphablox repository.
Components have properties that can be set through the properties sheet or
through Live Layout. For example, a DataBlox component contains properties for

64

the query string and the name of the data source that the DataBlox connects to. In
the next lesson, we will show how these component properties can be driven by
property references instead of hard coded values as we are doing here.
When you add a report to an application, you can override the values of a reports
properties so that the report acts like a template. For example, you can create a
report that displays the sales data for a certain time period. A custom report
property contains the value that determines the time period. In an application, you
can set the report property to different values to display reports for different time
periods.
You use the DataBlox component to access the query. You can also override
properties that you defined in the query by setting properties in the DataBlox
component.
To create an Alphablox report, you will use a PresentBlox and a DataBlox:
1. Create a report:
a. In the Blox Builder Project Explorer, expand BloxBuilderProject.
b. Right-click the Reports folder and select New > Report. The New Blox
Builder Report wizard displays with fields that you can customize.
c. Type PresentAndData for the name of your report in the Report name field.
The Blox Builder Report overview displays on the canvas. There are three
tabs, Overview, Model, and Layout, that display at the bottom of the canvas
view. Click Finish. The new PresentAndData report is displayed under the
Report folder in the Blox Builder Project Viewer perspective.
Note: Initially, the folder will contain the report, internal files used by the
interface, and the generated XML and HTML files used by the server to
display the report. By default, the internal files will be hidden from view.
For more complex reports, you can add your own images, localized resource
files, customized HTML layouts and other assets associated with the report.
Files added by the developer will always be visible in the folder.
2. Add a PresentBlox component:
Note: The PresentBlox component combines several Blox in one. It provides
you with simultaneous chart and grid views of the same data in the same
window space. The PresentBlox component has a graphical user interface that
can nest ChartBlox, GridBlox, PageBlox, ToolbarBlox, and DataLayoutBlox
within a single presentation. Application assemblers use PresentBlox properties
to tailor how these Blox will appear. Blox properties can be set through either
Live Layout, which uses the Bloxs interface for setting properties on the Blox,
or by setting them manually. Because we are setting only two properties, we
will set them manually.
a. Click the Model tab. A categorization of the reusable Blox components
displays in a palette view.
b. Drag and drop a PresentBlox component from the palette onto the canvas.
In the Properties view, located below the canvas, click the Present property
tab and provide the following values:
Divider location
Type 0.55.
Divider location provides a line between the chart and the grid in a
PresentBlox. A valid value is anything from 0 - 1, where 0.5 will
divide the chart and grid down the middle.
DB2 Warehouse Tutorial, Version 9.5

65

splitPaneOrientation
Select HORIZONTAL from the list.
splitPaneOrientation controls whether you want the chart and grid
on top of each other, or if you want them side by side.
3. Add a DataBlox component:
Note: A DataBlox component offers the following functionality:
v Provides a representation of a data set (in grid form), either relational or
multidimensional, for the assembler to access
v Enables application scripting (such as executing a query)
v Serves as a data source for other Blox (such as ChartBlox or GridBlox)
a. Drag and drop a DataBlox component from the palette onto the canvas.
4. Select Connection on the palette. Connect the DataBlox to the PresentBlox
component. Select and drag your cursor from the DataBlox port inside the
DataBlox component to the DataBlox port inside the PresentBlox component.
Connecting the two components through the DataBlox port ensures that the
components can communicate with each other.
5. Click the Save button to save your changes to the report. Ensure that you check
the Problems tab for any errors. You can click on a problem in the Problems
tab to view detailed information.

Lesson checkpoint
You learned how to:
v Create a report

Lesson 5: Customizing your queries and reports


In this lesson, you will parameterize your query and customize a report to use
different queries through the use of Blox Builder property references and
expressions.
Property references are similar to variables with scope and additional functionality.
Property references can have expressions that act like functions and transform the
value that the property reference refers to. For example, you can use an expression
to format a date before you display it.
Property references have a scope and optional default value and type. When a
property reference is encountered, if it does not already exist, it will be created.
When the property reference is created, it will have the specified context (for
example, the current query, report, or application) and if a default value was
specified in the reference, it will be applied.
The most basic form of a property reference is: ${scope:name} as in
${report:myInteger}. The complete form is ${scope:name=(type)defaultValue}.
The following example creates an integer with report scope with a default value of
5 ${report:myInteger=(Integer)5}.
In this lesson, you begin by creating a user property for the current year and
quarter. Because the current year and quarter are user properties, the properties
can be accessed from all component types and persists for the users session. We
will create these properties in the following query.
You will parameterize the query by replacing the member [n] with an expression
for the current quarter using the Property Reference Wizard.

66

Because queries live outside of applications and reports, when you preview a
query from the query designer, if it contains any property references you will need
to supply those values at runtime.
To create property references and parameterize your queries and reports:
1. Create the currentQuarter property reference based on the quarter of the
current day:
Note: You will need two property references: one that you will use in the
query for the current quarter of the year and another one that you will use in
the report. The property reference that you will create for the report is also the
query ID used in the report.
a. Double-click the priceAnalysis application.
b. Click the Custom tab in the Properties view.
c. Click the Add button to create the property reference. The Property Value
Editor opens.
d. For the Name field, type currentQuarter.
e. For the Type field, select Integer.
f. Click the icon button to the right of the Value field. The Property Value
Editor opens.
1) Click the Property Reference wizard button to bring up the Property
Reference wizard.
2) In the Scope tab, select System and click Next.
3) In the Property Name tab, select the existing dataTime property
reference and click Next.
4) In the Default Property Reference tab, leave the setting at Do not add a
default value to the reference and click Next.
5) In the Expression tab, select the datePart expression from the list.
6) In the Arguments table, click within the Value section and select
QUARTER. Click Finish.
7) In the Property Value Editor dialog, you should now see:
${system:dateTime}.datePart(QUARTER) Click OK. The same value
appears in the Value field on the Create Custom Property dialog.
8) In the Scope field, type application. Click OK.
9) Save your application.
2. Create the queryId property reference and set its value to Stores By Time:
a. Double-click the PresentAndData report. Click the Custom tab in the
Properties view.
b. Click the Add button to create the property reference.
c. For the Name field, type queryId.
d. For the Type field, select String.
e.
f.
g.
h.

For the Value field, type Stores By Time.


For the Scope field, select Report.
Click OK.
Save your report.

3. Parameterize the Stores By Quarter query:


Note: You will use the currentQuarter property reference to parameterize this
query. This query will always run for the current quarter.
DB2 Warehouse Tutorial, Version 9.5

67

a. Double-click the Stores By Quarter query.


b. In the Text Query field, delete the quarter number in the time member.
Replace the quarter number with the currentQuarter property:
${application:currentQuarter}. Your resulting query will have a time
member that looks something like the following:
DISTINCT( {
[Price Analysis].[Time].[All Time(Calendar)].[2003].[${application:currentQuarter}]
} ) ON AXIS(0)

c. Click Finish. Save the query.


4. Run the Stores By Quarter query:
a. Double-click the Stores By Quarter query.
b. Click the Query Runtime Configuration button. The Alphablox Server
Configuration window displays. You should already have your server
configuration information filled out. Click Next.
c. In the Query Tools Information window, select the ACS_External data
source, enter your user name and password, and click Next.
d. Select the cell under the user input column. Enter a quarter value from 1 to
4.
e. Click Finish.
f. Click Preview Query to preview the query.
5. Export the queries:
Note:
In order for a query to be used by a report, it needs to be exported to the
server that will be running the report.
a. Right-click on the Queries folder in the Blox Builder Project Explorer and
select Export. The Blox Builder Export wizard displays.
b. Select Alphablox Server Repository (live deployment). Click Next.
c. Assuming that you already saved a server profile to export to, click Next. If
not, see the following for more information: Lesson 3: Building Alphablox
queries with Blox Builder on page 61.
d. Select the workspace items that you want to export. All queries should be
selected by default.
e. Click Finish. The queries have now been exported to your Alphablox
instance.
f. Verify that the queries have been exported by opening a Web browser and
going to: http://server:portnumber/AlphabloxTooling/queries/
6. Parameterize your report:
Note: Reports can be used as templates. You will create a report property to be
used for the query so that each report link can have a different query that
underlies it. A report link is a report in the navigation that a user sees.
a. Under the Reports folder in the Blox Builder Project Explorer, double-click
the PresentAndData in the Blox Builder Project Explorer to open the report.
b. Select the DataBlox component in your report.
c. In the Data Query tab in the Properties view, click the icon to the right of
the QueryId property value field. The Property Value Editor is displayed.
d. Click Property Reference Wizard. The Property Reference wizard opens.
e. In the Scope tab, select Report. Click Next.

68

f. In the Name tab, click the list and select the queryId report property. Click
Finish.
g. In the Property Value Editor dialog, you should see the following value:
${report:queryId}. Click OK. The same value should also appear in the
QueryId field in your Properties view.
h. Save your report.
7. Preview the report:
a. In the Blox Builder Project Explorer view, right-click the PresentAndData and
select Preview Report. The Preview Report wizard opens.
b. In the Name Server Configuration field, your previously saved named
server configuration appears. Click Finish. Connecting to a server enables
you to preview, trace, deploy, and import your report. By default, your
report displays with the data from the Stores By Time query because you
set the value of the queryId property reference to Stores By Time earlier in
this lesson.
You have customized your query and report.

Lesson checkpoint
You learned how to:
v Create property references and use expressions
v Customize a query by parameterizing it
v Customize a report by parameterizing it
DB2 Warehouse Tutorial, Version 9.5

69

v Export the queries in order to preview a report using a query

Lesson 6: Creating and deploying an application to the


WebSphere server
In this lesson, you will create an application using the reports and queries you
have created in the previous lessons. You will deploy your application to your
Alphablox instance that is running in the WebSphere environment.
A Blox Builder application has a navigation tree called the report catalog. The
navigation contains folders and report links. A report link specifies a name to
appear in the application navigation, the report it uses, and what report properties
if any, to override. Therefore, when you add a report to an applications report
catalog, you can optionally override values for report property references used in
your report enabling you to use your report as a template. You can add several
links to the same report in your application, where each report link sets different
values for the report property reference. In this case, you will override the query to
be run with each report link.
Reports are added to an application by adding report links to the report catalog.
To create and deploy your application:
1. Add the report links to the application in specified folders:
a. Create a folder in the navigation:
1) Click the application file inside the Price Analysis application. The
Report Catalog appears in the canvas view.
2) Right-click the Report Catalog folder and select New Folder.
3) Name the folder Stores By Time. Click OK.
b. Add a report link for Stores By Time:
1) Drag and drop the PresentAndData report to the Stores By Time folder.
Ensure that all report editors are closed before dragging them.
2) Double click the PresentAndData report in the catalog. The report link
properties display.
3) Type Stores By Time for the display name.
4) In the properties table at the bottom of the dialog, for the QueryID row,
type Stores By Time under the New Value column. These query names
are case sensitive.
5) Click OK.
c. Add a report link for Stores By Quarter:
1) Drag and drop the PresentAndData report to the Stores by Time folder.
2) Double-click the PresentAndData report in the catalog. The report link
properties displays.
3) Type Stores by Quarter for the Display name.
4) In the properties table at the bottom of the dialog, for the QueryID row,
type Stores By Quarter under the New Value column. These query
names are case sensitive.
5) Click OK.
2. Export the application. You must export an application before you can run it in
the WebSphere environment.
a. Right-click the Price Analysis application in the Blox Builder Project
Explorer and select Export. The Blox Builder Export wizard displays.

70

b. Select Alphablox Server Repository (live deployment). Click Next.


c. Assuming that you already saved a server profile to export to, click Next. If
not, see the following for more information: Lesson 3: Building Alphablox
queries with Blox Builder on page 61.
d. The application to export is selected by default. Select the Include
dependencies found in the current project check box. This will ensure that
you export all necessary files to the server.
e. Click Finish. The application has now been exported to your Alphablox
instance.
f. Verify that the application has been exported by opening a web browser and
going to: http://server:portnumber/AlphabloxTooling/bloxbuilderapps/
3. Preview the report link in the Price Analysis application:
a. Right-click any object under Report Catalog and select Preview
Application. Your preview displays the application with the navigation in a
separate browser window.
b. Open the Stores By Time folder and select the Stores By Time report.
4. View the application in the WebSphere environment by opening a Web browser
and going to: http://localhost:9080/BloxBuilder/priceAnalysis
Note: The application name is case sensitive. If WebSphere Application Server
is not running on the same computer as Blox Builder, change localhost to your
WebSphere host name or IP address. If your WebSphere server is running on a
different port, change the port number accordingly.

Lesson checkpoint
You learned how to:
v Export your application
v Create and preview report links
v Add your report to your application
v Display your application in a Web browser

Module 6: Creating a mining model


In this module, you create a mining model to analyze customer purchase trends for
JK Superstore.
JK Superstore wants to analyze their customers purchasing trends to determine
which product combinations are bought by the same customer, specifically in the
Electronics department. By creating a mining model using the associations
function, you can find out which customers buy from the Electronics department
more frequently than other customers. You can also learn that customers who buy
from the Electronics department also buy from another department or
sub-department.
Finding associations is also known as market-basket analysis. The purpose of
market-basket analysis is that, based on the buying patterns of customers, JK
Superstore can aim at the following marketing strategies:
v Potentially increase sales by physically placing items that are purchased together
on the sales floor
v Advertising items that are purchased together on the Web

DB2 Warehouse Tutorial, Version 9.5

71

Data mining discovers business insights in your data. You can interactively create
and visualize a mining model in the Design Studio to gain valuable insights about
the data in your organizations warehouse. You can also use the Design Studio to
generate SQL code to compute a mining model or to deploy the models related
scoring function. This SQL can be pasted into Alphablox pages or any BI
application to provide embedded analytics.
The mining flow in this module is similar to the data flows that are described in
Module 2: Designing applications to build a data warehouse. Mining flows use the
same editor and some of the same SQL operators as SQW flows. Preparing data for
mining by using one of the operators in the Preprocessing palette is a case of SQL
Warehousing. Conversely, certain mining operators can be used as part of a data
flow for warehouse building.

Learning objectives
After you complete the lessons in this module, you will be able to:
v Create a mining project
v Create a mining flow
v Define mining steps for a mining flow
Add preprocessing operators to bring the data into a form suitable for data
mining
Add a mining operator
Add a visualizer operator
Add an operator to extract the model information in tabular form
v View the mining model results

Time required
This module should take approximately 75 minutes to complete.

Optional: Start the tutorial here

If you already completed the previous modules, do not complete these steps; go to
Lesson 1. If you did not complete the earlier modules and want to start here, you
need to complete these steps.
To start the tutorial here:
1. Complete the initial setup instructions for your platform, as directed in the
Introduction to the DB2 Warehouse Tutorial on page 1. Create the
DWESAMP database but do not run the setupdwesamp script.
2. Create and load the tables in the DWESAMP database by running the following
script:
Windows

C:\Program Files\IBM\dwe\samples\data\setupolapandmining.bat
Linux

/opt/IBM/dwe/samples/data/setupolapandmining.sh
3. Connect to the locally cataloged DWESAMP database.

72

a. In the Database Explorer, expand the Connections folder to view the


existing databases.
b. Right-click the DWESAMP database and click Reconnect.
c. When prompted, type your DB2 user name and password and click OK.

Lesson checkpoint
You completed the prerequisite steps for starting the tutorial here. You can now
continue with Lesson 1.

Lesson 1: Creating a business intelligence project in the


Design Studio for data mining
In this lesson, you create a data mining project in the Design Studio so that you
can analyze your data and create and use data mining models.
You can create projects in the Design Studio so that you can work on different
parts of your business intelligence solution. You can work with your projects in the
Data Project Explorer, which is your working space to build and test your solution.
To work specifically on your mining model, you need to create a data warehouse
project.
Although you can create mining flows in the SQL Warehousing project, you will
create a separate project for the purpose of this tutorial.
Prerequisite: Enable the database for data mining in one of the following ways:
v In the Database Explorer view of the Design Studio, connect to the database,
open the database container, and right-click the blue database icon. Then select
Enable the database for Data Mining.
v From the DB2 Command Window, run the following command on the system
where your database resides:
idmenabledb dwesamp fenced dbcfg
For this step, the DWESAMP database has to be cataloged on the client.
To create a new project:
1. From the Design Studio BI Perspective, click File New Project.
2. In the New Project wizard, select Data Warehousing Data Warehouse Project
and click Next.
3. On the Data Warehouse Project page, type a project name, such as DWE
Tutorial Mining Project and click Next.
4. On the Referenced Projects page of the New Project wizard, do not select any
projects. Click Finish.
The new project appears in the Data Project Explorer.

Lesson checkpoint
In this lesson, you created a data warehouse project.
You learned how to create an empty business intelligence project, which is a
prerequisite for creating a mining flow and subsequently a data mining model.

DB2 Warehouse Tutorial, Version 9.5

73

Lesson 2: Creating a mining flow in the Design Studio


In this lesson, you create a mining flow in the Design Studio that you will use to
create your mining model. A mining flow is a sequence of data transformation
steps and data mining steps that perform mining analysis.
To create a new mining flow:
1. From the Design Studio, click File New Mining Flow.
2. In the New File wizard, select the project that you created in lesson 1 of this
module.
3. On the Data Warehouse Project page, type a mining flow name, such as DWE
Tutorial Mining Flow and click Next.
4. On the Select Connection page, connect to a database. This connection allows
you to interact with the live database and see a sample of actual data in the
tables.
a. Select Use an existing connection.
b. Select DWESAMP from the list of databases.
c. Click Finish. You might be prompted to enter your user ID and password
for the database.
5. Expand the Mining Flow folder under your project in the Data Project Explorer
view, you can see the new empty mining flow.

Lesson checkpoint
In this lesson, you created an empty mining flow that will be used to define the
mining model steps.
You learned how to:
v Create a mining flow
v Connect to an existing database

Lesson 3: Defining mining steps for mining flows


In this lesson, you define mining steps for your mining flow. These steps might be,
for example, selecting a table, a preprocessing function, a mining operator, and a
target.
You define each mining flow by placing operators on the canvas of the Design
Studio, defining the operators properties, and joining all the steps into a
meaningful mining flow. The output of the flow is a data mining model that is
stored as a DB2 object.
When you define the mining steps for mining flows, you define the following
things:
v The data source and the names of the tables that contain your source data, and
optionally the columns that the tables contain
v The preprocessing operations that comprise the stages of identifying, collecting,
filtering, and aggregating raw data into a format that is required by the data
models
v The mining operator and its settings
This tutorial introduces the Associations data mining function. Associations mining
is the process that enables you to discover which combinations of products your
customers purchase and the relationships that exist at all levels in your product
hierarchy. The relationships that are discovered by the data mining are expressed

74

as association rules. With the associations function, you can perform market basket
analysis to explore product affinities to understand which products tend to be
bought by the same customers. In the context of market basket analysis, an
example association rule can have the following form:
If product A is purchased, then product B is likely to be also purchased by
the same customer.
In addition to the rule, the associations mining also calculates some statistics about
the rule. In market basket analysis, the following three statistical measures are
usually used to define the rule:
Support
The support of an association rule measures the fraction of baskets for
which the rule is true. For example, if product A and product B are found
in 10% of the baskets, then the support value is 10% (as a percentage
value) or 0.1 (as an absolute value). The percentage value is calculated
from among all the groups that were considered.
Confidence
The confidence in an association rule is a percentage value that shows how
frequently the rule head occurs among all the groups that contain the rule
body. The higher the value, the more often this set of items is associated
together. For example, if product B is present in 50% of the baskets that
contain product A, then the confidence value is 50% (as a percentage
value) or 0.5 (as an absolute value). Expressed another way, if product A is
in a particular basket, then product B will be found in the same basket on
50% of occasions.
Lift

The lift value for the association is the ratio of the rule confidence to the
expected confidence of finding the rule in any basket. For example, if
product B is found in only 5% of all baskets, then the Lift for the rule
would have a value of 10.0. The lift value of 10 means that the association
of A and B is occurring ten times more often than if B were selected by
chance. Lift is therefore a measure of how the rule improves the ability to
predict the rule body.

You can also use taxonomies with the Associations mining function. You can make
the associations that are found among items more meaningful if you group the
items into subcategories, and then group these subcategories into categories. The
result is a hierarchy of categories with the items on the lowest level. This hierarchy
is called a taxonomy.
In this tutorial, the sample data contains a retailers products, organized by
departments, in addition to purchases that are made by customers. The output is a
table that contains the rules in the associations model, which can also be viewed
by an Alphablox report. To perform the associations mining, you must use both the
transaction level data and the product hierarchy data to calculate the required
association rules. In addition, the product hierarchy data is used by the mining
tool to automatically determine associations between individual products, product
subgroups, product subgroups and products, product groups and subgroups, and
so on. Associations at all levels in the product taxonomy are derived.
Tip: The mining steps for the mining flow in this tutorial use the same mining
preprocessing operators as those in SQW.

DB2 Warehouse Tutorial, Version 9.5

75

The following figure shows the completed mining flow.

Figure 17. A completed mining flow

To define your mining flow:


1. Ensure that your mining flow is open. If it is not already open, expand the
mining flow folder under your project in Data Project Explorer and
double-click the DWE Tutorial Mining Flow mining flow.
2. Add two table source operators to the mining editor canvas:
Tip: To maximize the canvas area, double-click the mining flow name above
the canvas. Double-click the mining flow name again to return the canvas area
to the normal size.
a. From the Sources and Targets palette, drag a table source operator to the
canvas. In the Select Database Table window, expand the DWH schema,
select the ITM_TXN table, and click Finish.
b. Drag a second table source operator to the canvas. In the Select Database
Table window, expand the DWH schema, select MKT_BSKT_TXN and click
Finish.
3. Add a table join operator to join the table source operators. Add the operator
to the canvas, connect the operator to the table source operators, and define
the table join properties:
a. From the Preprocessing palette, drag a table join operator to the canvas.
b. Connect the following operator ports:
v Connect the ITM_TXN table source operator output port to the table join
operator In input port.
v Connect the MKT_BSKT_TXN table source operator output port to the
table join operator In1 input port.
c. Right-click the table join operator and click Show Properties View. From
the view, you can optionally change the operator name and add a
description.
d. From the Properties view, select the Condition tab and click the ellipsis (
) push button.
e. In the SQL Condition Builder window, define the following conditions by
double-clicking your selections:
v MKT_BSKT_TXN_ID is equal for the two inputs, so that the SQL code
reads: (IN_nn.MKT_BSKT_TXN_ID = IN1_nn.MKT_BSKT_TXN_ID)
where nn is system generated.
v Add an AND condition keyword.
v NBR_ITM from table IN_nn is greater than (>) 0.

76

Click OK.
The following figure shows the SQL Condition Builder window.

Figure 18. Example of the SQL Condition Builder window and SQL text.

f. In the Properties view, click the Select List tab. The columns that are
required for the tutorial are:
v PD_ID
v CNTPR_ID
Remove all the unnecessary columns by selecting the column and clicking
the Delete button (
). Do not delete the columns listed above that are
required for the tutorial.
g. Verify your partial flow:
1) Right-click the table join operator and select Run to this step.
2) In the Partial Execution of the Mining Flow window, select Execute
generated code and Show sample results. Samples are displayed in
the Execution Status view.
3) Click Finish.
h. Before proceeding to the next step, save your work.
4. Add a table source operator to the mining editor canvas.
a. From the Sources and Targets palette, drag a third table source operator to
the canvas to prepare the taxonomy and name mapping information.
b. In the Select Database Table window, select the PRODUCT table from the
MARTS schema, and click Finish.
DB2 Warehouse Tutorial, Version 9.5

77

5. Add a select list operator to transform columns in a selection list. Add the
operator to the mining editor canvas, connect to a table source operator, and
define the select list parameters.
a. From the Preprocessing palette, drag a select list operator to the canvas.
b. Connect the output port of the PRODUCT table source operator to the
select list operator input port.
c. Right-click the select list operator, and click Show Properties View.
d. In the Properties view, do the following actions:
1) Click the Select List tab. The columns required for the tutorial are:
v PD_ID
v NM
v PD_DEPT_NM
v PD_SUB_DEPT_NM
2) Remove all the unnecessary columns by selecting the column and
) button. Do not delete the columns listed
clicking the Delete (
above that are required for the tutorial.
3) For each of the following columns, select a column and click the
ellipsis (
) push button to modify the expressions.
Type the following expressions (in this example, INPUT_12 is the
internal input table name). Include spaces between the single quotation
marks:
v For NM, type rtrim( INPUT_012.NM )
v For PD_DEPT_NM, type Dept: || rtrim(INPUT_012.PD_DEPT_NM)
v For PD_SUB_DEPT_NM, type Subdept: ||
rtrim(INPUT_012.PD_SUB_DEPT_NM) || in ||
rtrim(INPUT_012.PD_DEPT_NM)
6. Add a distinct operator to remove duplicate rows. This operation is needed to
create unique subdepartment-department pairs for the taxonomy. Place the
operator on the mining editor canvas and connect to the select list operator.
a. From the Preprocessing palette, drag a distinct operator to the canvas.
b. Connect the output port of the select list operator to the distinct operator
input port.
c. Right-click the distinct operator, and click Show Properties View.
d. Click the Column Select tab, and then specify the PD_DEPT_NM and
PD_SUB_DEPT_NM columns as selected. To specify these columns, select
NM and PD_ID in the Selected columns list and click the left arrow
button (
) to move the selections to the Available columns list.
You completed the preprocessing steps for your mining model. The following
figure shows the preprocessing steps.

78

Figure 19. An example of the partial mining flow that includes the preprocessing steps.

7. Add an associations operator to find relationships in your data. Add the


operator to the mining editor canvas, connect its input ports, and define its
properties.
a. From the Mining Operators palette, drag an associations operator to the
right of the table join and distinct operators.
b. Click the add port (
) icon at the bottom left to add an additional
category port.
c. Connect the following ports:
v The inner output of the join operator to the associations operator input
port
v The output port of the select list operator to the names port of the
associations operator
v The output port of the select list operator to the category port of the
associations operator (You can make multiple connections between
ports.)
v The result output of the distinct operator to the category1 port that you
added to the associations operator.
d. Right-click the associations operator, and click Show Properties View.
e. On the Model Name page, type MBA in the Prefix field and RULES in the
Model name field. The operator will create a model named MBA.RULES.
f. On the Mining Settings page, specify the settings:
1) Select CNTPR_ID in the Group Column field.
2) Type 4 in the Maximum rule length field. This value specifies the
maximum number of items that occur in an association rule. By
specifying 4, you receive association rules with no more than three
items in the rule body and one item in the rule head.
3) Type 0 in the Maximum number of rules field, which indicates that the
number of rules is unlimited.
4) Type 10 as the percent value for the Minimum Confidence field to
specify that the confidence of each generated rule is greater than or equal
to ten percent.

DB2 Warehouse Tutorial, Version 9.5

79

5) Type 0.1 as the percent value for the Minimum Support field to
specify that the support of each generated rule is at least 0.1 percent.
g. On the Name Maps page, select PD_ID in the Item ID Column field and
NM in the Item Name Column field.
h. On the Taxonomy page, select PD_ID in the Child Column field and
PD_SUB_DEPT_NM in the Parent Column field for the category map. For
the Category1 map, select PD_SUB_DEPT_NM in the Child Column field
and PD_DEPT_NM in the Parent Column field.
i. On the Column Properties page, select Names for the Name Mapping field
and Yes for the Taxonomy field.
j. On the Item Format page, select Default.
8. Add a visualizer and an associations extractor operator to the canvas and
connect the operator ports.
a. From the Sources and Targets palette, drag a visualizer operator to the
canvas to the right of the associations operator.
b. Connect the associations operator model port to the visualizer operator
model port.
c. From the Mining Operators palette, drag an associations extractor operator
(which extracts information from an associations rule model) to the canvas.
d. Connect the associations operator model output port to the associations
extractor operator model input port.
9. Create a target table that is suitable for the rules output of the associations
extractor operator port.
a. Right-click the rule output port of the associations extractor operator and
select Create Suitable Table from the menu. The Required Table
Information window opens.
b. Type RULES in the Table name field.
c. In the Table schema field, select MARTS as the schema in which the table is
created.
d. Select the tablespace in which the table is created.
e. Click Next. The Table Details page opens. Use the default settings on this
page.
f. Click Finish.
g. Save the mining flow.
Tip: You can also create the table manually in a DB2 command window by
connecting to the DWESAMP database and by copying and running the
CREATE TABLE MARTS.RULES statement from the following script:
Windows

C:\Program Files\IBM\dwe\samples\data\mining\mba.sql
Linux

/opt/IBM/dwe/samples/data/mining/mba.sql
If you create the table manually, you also need to drag a target table operator
onto the canvas and connect the rule port of the associations extractor
operator to the input port of the target table operator by selecting Connect by
name from the Select Column Connections dialog.

80

10. Validate the flow that you created. Click the Validate this mining flow (
)
icon on the toolbar. You will see a red circle with a white cross in the top left
corner if anything is wrong with an operator. Double-click that symbol for a
diagnostic window.
The mining flow steps are displayed in the mining editor.
The completed mining flow will look like this:

Figure 20. A completed mining flow

Lesson checkpoint
You created the mining steps for the model that will analyze product associations.
You learned how to:
v
v
v
v
v

Define mining steps for a mining flow


Add and connect preprocessing operators
Add and connect mining operators
Add an extractor operator
Add a visualizer operator

Lesson 4: Running and viewing the mining model


In this lesson, you run the mining flow that you created in the previous lesson.
The sample data contains a retailers products. The associations function shows
you which product purchase combinations occur.
To run and view your mining flow:
1. Run the mining flow in the database.
a. Click the Execute this mining flow in the database icon (
) on the
toolbar.
b. In the Execution of Flow window, accept the default values and click
Execute. The Mining Flow Execution status window opens. When the run
completes, the Association Visualizer window opens with several views to
display the association rules. The Rules view is a tabular view that shows
one rule in each row with measures about the relevance and quality of the
rule.
2. View the mining model.

DB2 Warehouse Tutorial, Version 9.5

81

a. Click Item Sets to show a list of frequently sold products and product sets.
Click the entry [Dept: ELECTRONICS]. You can see in the Support column
that 5.4303% of all customers bought from this department.
b. If you want to find out which customers buy from the electronics
department more frequently than others, click the Fan In icon (
) on the
toolbar, and then select the Rules tab. You can see rules of this type: If
customer buys A, then the customer is likely to buy also from the
electronics subdepartment, where A can be a product or another department
or subdepartment. The Lift column indicates how much more frequently
this happens compared with all customers. The Absolute Support column
indicates the number of customers involved. A lift of 3.63 means that a
customer who buys from the photography department is nearly four times
more likely to purchase from the electronics department.
c. Select the Graph tab to see the rules in graphical form.
d. Close the associations model. When prompted to save the mining model
results, do not save the results.
In the Execution Status view, you see the status and action of the execution
process. In the Database Explorer, you see that the mining model that was
displayed is stored in the database.
The model that you ran used the associations function. Like the associations
function, the sequential patterns and clustering functions do not need a scoring
function against the model. That is, the model itself can be the end result. If you
want to use a model to make predictions, you need to test the models quality
(operator Tester).

Lesson checkpoint
In this lesson, you ran and viewed the mining model that analyzes product and
customer purchase combination associations.
You learned how to:
v Run your mining flow
v View the results of the associations model

Module 7: Using Miningblox to create a mining Web application


In the previous module you learned how to build a mining flow. This task is
typically performed by a data mining expert. However, line-of-business users, who
are not familiar with data mining, also want to execute similar mining flows, for
example, to find out which products were sold together in certain stores over a
certain period of time. But they do not want to rebuild or modify such mining
flows themselves. They are looking for an easy-to-use application that allows them
to enter just a few parameters (such as store and time frame) to receive and see the
computed association rules. The data mining operations are hidden by the
application so that the user only needs to understand the results. You can build
such applications with Miningblox, which are automated Web applications based
on Alphablox technology.
In this module you create a Web application that has the following input variables:
v The type of stores
v The year

82

After having entered values for these variables the association rules for the stores
of the specified store type and year are computed by using a modified version of
the mining flow presented in Module 6: Creating a mining model on page 71. As
a result the visualization of the rules model is shown.
This module consists of the following lessons:
v Creating the Miningblox Sample project
v Creating the Miningblox application
v
v
v
v

Creating the Alphablox data source


Deploying the data warehouse application
Deploying the Web application
Customizing your application

Learning objectives
After completing the lessons in this module, you can perform the following tasks:
v Create the control flows that contain the operations executed by the Web
applications
v Create deployment packages for Miningblox applications (data warehouse and
Web applications)
v Deploy and run the different parts of this application
v Customize your application by modifying the JSP pages
v DB2 Warehouse Administration Console

Time required
This module takes approximately 60 minutes to complete.

Prerequisites
You should have completed Module 6: Creating a mining model on page 71.

Lesson 1: Creating the Miningblox Sample project


In this lesson, you create a project named Miningblox Sample, which contains the
mining and control flows executed by this Web application.
For a Web application based on Miningblox you must define the following control
flows:
v The process control flow which contains the operations performed after starting
a new request
v The cleanup control flow that is executed when a request is deleted. It has to
remove all results of a request.
These flows usually contain variables. These variables can:
v Correspond to the input parameters of the Web applications.
v Contain request specific information, such as the ID of the request.
These variables are instantiated when a request is started. You use them to retrieve
request specific information when you define a mining flow or a control flow.

DB2 Warehouse Tutorial, Version 9.5

83

For this Miningblox Sample application the process control flow contains the
mining flow that computes the association rule model. The cleanup control flow
contains a mining flow that deletes the rule model from the rule model table.
For this lesson, it is assumed that you are familiar with the design of control flows
and mining flows, otherwise, refer to the corresponding module of this tutorial.
To set up the project and define the flows with variables, complete the following
steps:
1. To create a new data warehouse project, select File New Data Warehouse
Project. Specify the new project name: Miningblox Sample.
2. To create variables and variable groups in the Design Studio, select Data
Warehousing Manage Variables, choose the correct project, and click Next.
Note: It is recommended that you define two groups of variables: user
variables and runtime variables. In this example you create the following
groups of variables:
v The inputParams variable group for the variables to be entered by the user.
v The runtimeParams variable group, whose variables are automatically
instantiated by the application when it runs.
3. To create the variables groups:
a. Click New on the left part of the wizard.
b. Name the group inputParams and click OK.
c. Repeat steps 2a) and 2b) to create a second group that is called
runtimeParams.
4. To create the variables:
a. Select the inputParams group and click New on the right part of the
wizard.
b. Type the following information:
v Name: year
v Type: Integer
v Current value: 2002
v Final phase for value change: EXECUTION_INSTANCE
c. Repeat the steps 3a) and 3b) to create a second variable. Type the following
information:
v Name: storeType
v Type: String
v Current value: all
v Final phase for value change: EXECUTION_INSTANCE
d. Select the runtimeParams group and create the following variable:
v Name: modelName
v Type: String
v Current value: MBA_RULES
v Final phase for value change: EXECUTION_INSTANCE
The variables that you have just created can now be used in mining flows.
5. To create the mining flow for the process control flow:
a. Copy the DB2 Warehouse Tutorial mining flow that you created in Lesson
3: Defining mining steps for mining flows on page 74 to the Miningblox
Sample project. Rename it to Tutorial Miningblox Flow.

84

b. Associate the flow with the DWESAMP database:


1) Right-click Tutorial Miningblox Flow in the Project Explorer
2) Select Set Database from the popup menu.
3) On the Set Database wizard, accept the default values and click Next
4) On the Select Connection page, select DWESAMP in the list of existing
connections and click Finish.
c. Erase the operators to the right of the Association operator. Miningblox can
interact directly with the model. A special database table is not required.
d. Erase the link between the MKT_BSKT_TXN table and the Join operator,
and insert a Where condition operator. When linking the Where and the
Join operators, the Select Column Connections window opens. Choose
Propagate Columns and click OK.
e. To set the condition, open the Properties view. On the Condition page,
type the following information under Filter condition:
year(TXN_STRT_TMS) = ${inputParams/year} AND
( all IN ${inputParams/storeType} OR
(select STR_TP_NM from MARTS.STORE
where STR_IP_ID = OU_IP_ID) IN ${inputParams/storeType})

Note: To avoid cut-and-paste errors, make sure that the text you copy
from the PDF format only contains straight single quotes (such as all)
instead of apostrophes.
f. To define the variable for the Association operator. On the Model Name tab
of the Properties view, click on
, choose Use Variable, and click the ...
button to select ${runtimeParams/modelName}.

Figure 21. DB2 Warehouse Tutorial mining flow modified for Miningblox

6. Run the flow. To check if a model has successfully been created, select the
Database Explorer, click on the DWESAMP database, and then select
DWESAMP Data Mining Model Rules. Here you should find the model
named MBA_RULES.
7. To design the cleanup mining process:
a. Create a new mining flow that is called CleanUpFlow.
b. Drag a Custom SQL operator onto the canvas. Type the following
statement as SQL CODE:
delete from IDMMX.RULEMODELS where MODELNAME = ${runtimeParams/modelName}

DB2 Warehouse Tutorial, Version 9.5

85

8.
9.

10.
11.

c. Drag a Source Table operator onto the canvas and link it to the Custom
SQL operator. The Custom SQL operator must be linked to a source table,
but the table is not important in this case because the SQL custom code
does not refer to it.
Run the flow and check if the model named MBA_RULES that you have
viewed in 6 on page 85 has been erased:
To define the control flow for the mining process:
a. Create a new control flow that is called Mining Process.
b. Drag a Mining Flow operator to the canvas and link it to the Start
operator.
c. On the Properties view, choose Tutorial mining flow.
d. Drag and drop one End operator and one Fail operator onto the canvas,
then link the End operator to the success output port and the Fail operator
to the fail output port of the mining operator.
Repeat step 9 to construct a control flow that is called CleanUp process for the
cleanup mining flow.
Execute the two control flows and check if a rule model is created or deleted.
You have to refresh the Rules folder in the Database Explorer by right-clicking
the Rules folder and selecting Refresh to browse the latest changes after the
execution of a control flow.

Lesson checkpoint
In this lesson, you created the project and flows, on which your application is
based. You also created variables and applied them in the flows so that they can be
initialized and used by the application.

Lesson 2: Creating the Miningblox application


In this lesson, you create the Miningblox application to be deployed.
Now that the flows are defined, you can create the Miningblox application. A
Miningblox application comprises the following components:
Data warehouse application
This application is deployed on the DB2 Warehouse Administration
Console from where you can run the flows.
Web application
This application controls the data warehouse application and retrieves the
mining data. It displays this data on a Web browser.
In this lesson, you use the new Miningblox Application wizard to create a
Miningblox application. The wizard creates the:
v Data warehouse application .zip file for deployment to the DB2 Warehouse
Administration Console.
v Web application EAR file for deployment to the WebSphere Application Server.
This Web application can create new mining tasks, manage old tasks, and
display the association model with your Miningblox tags.
To create the application, complete the following steps:
1. Select File New Miningblox application to open the wizard.
2. On the Project Selection page, choose the Miningblox Sample project, and click
Next.
3. Type MBA_RULES as the profile name and click Next.

86

4. On the Control Flow Selection page, click >> to import all of your flows into
the application, and click Next.
5. On the Resource Profile Management, Variable Management, DDL File
Selection, and Saving Application Profile pages you can leave the default
settings, and click Next.
6. Click Next on the Code Generation page which again does not need any
special input, then the Package Generation page is displayed, where you can
specify where to save the data warehouse application .zip file.
Type C:\temp on Windows and/tmp on Linux, and click Next. An
MBA_RULES.zip file is created at the selected location. The pages for
configuring the Miningblox Web application are displayed.
7. On the Miningblox application details page, add DWESAMP as the Alphablox
data source, select Mining Process as the Work Control Flow, and make sure
that ${runtimeParams/modelName}is the Variable for model name. The
remaining fields are filled out correctly because the wizard searches for
keywords, such as model, in the flow and variable names.
8. On the Result Page Template Selection page, select a Miningblox tag to specify
the way in which the mining model is displayed. Select the
AssociationModelVizualiser, and click Next.

Figure 22. Choosing the appropriate visualizer tag

9. On the Result Page Editor page, you can see the code of the resulting JSP
page. This allows you to make changes to the tag attributes but you do not
need to do anything here.
10. On the Web Application Generation page, specify again C:\temp on Windows
or /tmp on Linux for the ear file directory, and click Finish.
You now have a data warehouse application .zip file and a Miningblox Web
application EAR file.
DB2 Warehouse Tutorial, Version 9.5

87

Before you can start the application, you must first complete the following tasks,
which are also described in this tutorial.
1. Create the Alphablox data source if it does not yet exist.
2. Deploy the data warehouse application to the DB2 Warehouse Administration
Console.
3. Deploy the Web application to the WebSphere Application Server and start it.

Lesson checkpoint
In this lesson, you used the wizard to prepare your application for deployment.
You learned how to:
v Use the Miningblox Application Deployment Preparation wizard.
v Create a data warehouse application .zip file.
v Create a Web application EAR file.

Lesson 3: Creating the Alphablox data source


In this lesson, you create an Alphablox data source to enable the blox components
to access the database.
Alphablox data sources are an abstraction layer for different data sources that are
used by Alphablox tags to access data. Miningblox tags need an Alphablox data
source to communicate with the database, for example, to read the model. In this
lesson you create a data source called DWESAMP. This data source allows your
Miningblox application to access the DWESAMP database to retrieve the
association model created by the data warehouse application. Before you continue,
make sure that the WebSphere Application Server is started.
To create an Alphablox data source, complete the following steps:
1. Log into the Alphablox Administrative page as a user with administrator rights
at the following URL: http://myappsvr:9080/AlphabloxAdmin/home, where
myappsvr is the host name or IP address of the computer server where
WebSphere Application Server is installed.
2. On the Administration tab, click Data Sources, then click Create.
3. Type DWESAMP in the Data Source Name field.
4. Choose IBM DB2 JDBC Type 4 Driver from the Adapter list.
5. Specify the appropriate values for Server Name, Port Number, and Database
Name fields (the port number should be 50000 and the database DWESAMP).
6. Specify the Username and Password for a user with DB2 rights.
7. Save the data source and then click Test the connection to check if the
connection is successful. If the test connection fails:
v Check if the DB2 database is running.
v Check if you entered a user name with DB2 access rights.
v Ask your DB2 administrator for the server name and port values.
You created an Alphablox data source that gives your Miningblox application
access to the DWESAMP database.

Lesson checkpoint
In this lesson, you learned how to create an Alphablox data source.

88

Lesson 4: Deploying the data warehouse application


In this lesson, you use the DB2 Warehouse Administration console to create a new
data source and to deploy the data warehouse application.
While Miningblox uses the Alphablox data source to access the database, the DB2
Warehouse Administration Console needs to have its own data source to run the
process instances.
Note:
If you already created the DWESAMP database resource in Module 3, skip to step
5 of the following procedure.
To deploy the data warehouse application:
1. Open a browser and start the Administration Console by going to:
http://myappsvr:portnum/ibm/console/, where myappsvr is the host name or IP
address of the computer where WebSphere Application Server is installed and
portnum is the port number that is assigned to the WebSphere profile. The
default port number for a clean installation of WebSphere Application Server is
9060. If this port number does not work, check the value of the WC_adminhost
entry in the following file: %WAS_ROOT%\profiles\dwe\properties\
portdef.props
2. In the View list, select DB2 Warehouse and expand the navigation tree. The
tree now shows only the DB2 Warehouse-related functions.
3. Select Common Resources Manage Data Sources and click the Create
button.
4. Define the DWESAMP database resource.
a. In the Database Display Name field, enter dwesamp.
b. Clear the Managed by WAS check box and click Next.
c. In the JNDI Name field, enter jdbc/dwesamp.
d. In the Database Name and DB Alias fields, enter dwesamp.
e. In the Host Name field, enter the IP address or full name of the computer
where the data source is located. If the computer is the same as the server
where WebSphere Application Server is running, enter localhost.
f. In the Port Number field, accept the default value of 50000.
g. Click Next.
h. Enter a valid user ID and password.
i. In the Access Type list, select Public.
j. Click the Test Connection button. Assuming that the connection is
successful, click Finish. The Manage Database Resources page is displayed
and shows the new data source in the list.
5. Deploy the application.
a. Select DB2 Warehouse SQL Warehousing Warehouse Applications
Deploy Warehouse Applications.
b. Choose the Web Client location option, and click Browse to select the
appropriate .zip file.
c. Click Next, and check that the Review Page contains the mining process
and the cleanup process. Click Next.
d. In the General step, enter the directories where the application, the logs,
and the working files are saved. Enter:
DB2 Warehouse Tutorial, Version 9.5

89

v C:\temp\application on Windows or/tmp/application on Linux in the


Application Home Directory field
v C:\temp\logs on Windows or /tmp/logs on Linux in the Log Directory
field
v C:\temp\work on Windows or /tmp/work on Linux in the Working
Directory field
Click Next.
e. In the Data Sources step, you should see the DWESAMP data source that
you created.
f. Click Next for the System Resources step.
g. Click Finish. You can now see the MBA_RULES applications list with an
Enabled status.
6. Launch a test run of your deployed application:
a. Go to DB2 Warehouse SQL Warehousing Processes Run Processes.
b. You can see a list of the available processes, check the box for the
MiningProcess and click on Start.
c.
d.
e.
f.
g.

Name the instance test and click Next.


Click Finish. You can see the MiningProcess with a Scheduled State.
Click on Refresh to see if the flows ran successfully.
Do the same with the CleanUpProcess using the same name for variable.
The MiningProcess test run is shown in the Miningblox Web application
task list without parameters, such as results or task name. To remove the
defined test runs, go to History and Statistics View Statistics for Process
Instances, select the instances that you have just created, and delete them.

Your data warehouse application is now deployed and enabled.

Lesson checkpoint
In this lesson, you created a data source for your data warehouse application and
deployed it using the DB2 Warehouse Administration Console.
You learned how to:
v Create a data source for a data warehouse application.
v Use the DB2 Warehouse Administration Console to deploy an application.

Lesson 5: Deploying the Web application


In this lesson you deploy and start the MBA_RULES Web application that you
created within the Design Studio. Then, the user can access this application using a
Web browser.
To deploy your application, complete the following steps:
1. Open the DB2 Warehouse Administration Console.
2. In the View list, select All tasks.
3. Click Applications Install New Application.
4. Select the location of your MBA_RULES.ear file and click Next.
5. On the Select installation options page, type MBA_RULES as the name of your
application, and click Next.
6. In the Map modules to servers page, select the checkbox next to your
application and click Next.
7. On the Summary page, click Finish.

90

8. On the Installing... page, wait until the Application MBA_RULES installed


successfully message is displayed, and click on Manage Application. You
obtain the list of the applications installed on the server, click on the
MBA_RULES link to enter the configuration menu.
9. Click on Security role to user/group mapping and select the All
authenticated check box and click OK. By default, the Miningblox templates
are configured with the WebSphere security role: MiningbloxUser. People
assigned to this role have access to all of the pages. You can specify which
users have this role in the Map security roles to users/groups step. In this
example, the All authenticated setting is used to force users to log in before
they can view the application. If users are authenticated, the originators of the
mining tasks can be seen in the task list of the application.
Note: A Miningblox application requires authentication to start and manage a
task. Only authenticated users with access rights to the DB2 Warehouse
Administration Console can run and manage tasks in the Miningblox
application.
10. Click Save directly to Master Configuration.
11. Start your application.
a. Select Applications Enterprise Applications on the left menu.
b. Select MBA_RULES and click Start.
You can now access the MBA_RULES application in your Web browser under the
following URL: http://serverLocation:9080/MBA_RULES. As you have selected an
authenticated protocol, you must log in to access it. The Welcome page gives an
overview of the JSP pages used by the application.

Figure 23. The Welcome page for a Miningblox application

Your application is now completely deployed and is ready to be used. To test it


and see how it works, click on New Task. You obtain the following window:

DB2 Warehouse Tutorial, Version 9.5

91

Figure 24. The inputForm.jsp page

You can accept the default values and click on Run. While the task is running, you
see the following window:

Figure 25. The Processing task window

When the task has completed successfully, your association model is displayed in
an association visualizer, which is similar to the visualizer in the Design Studio. If
you are only interested in business goals, you can now easily obtain an association
rule model for the year and the store you selected. You do not need any mining
skills for this task.

Figure 26. The Visualizer window

92

On the first column you can see the rules created by the model. The three next
columns (Support, Confidence, Lift) define the quality of each rule.
You have created a mining application that can be used from any Web browser. If
you want to change the input page, you can customize the applications JSP pages.

Lesson checkpoint
In this lesson, you deployed the MBA_RULES Web application on the WebSphere
Application Server and used it in your Web browser.
You learned how to:
v Choose the correct parameters for deploying a Web application.
v Use your mining application.

Lesson 6: Customizing your application


In this lesson, you learn how to change JSP tags to make your input form more
usable.
The Web application that you deployed and tested in the previous lesson
comprises only the minimum functionality that you expect from an automated
Web application for mining. However, in general the usability has to be improved,
other result pages have to be added, the design has to conform to certain
standards.
In the Web application of the last lesson, the input page for entering the
parameters consists of simple text fields only. Therefore you might enter wrong
values. In this lesson, you learn how to modify the inputform.jsp and the
processTask.jsp pages such that these text fields are replaced by list boxes that
contain the valid values for the store types and the year parameters.
The files that are used in this lesson are in the following directory on the server:
WAS_INST_dir\profiles\dwe\installedApps\YourNodeName\MBA_RULES.ear\MBA_RULES.war\

Note: It is recommended to use a development tool such as Rational Application


Developer to customize and deploy the application again. To ease the task in this
lesson, you now change the JSP files in the directory of the application server.
Before you can use the modified application, you must first create the
MARTS.STORETYPES table. This table is also used by the Miningblox MBADemo
sample. To create the table:
1. On the server, enter the directory DWE_INST_dir\Miningblox\Samples\
MarketBasketAnalysis.
2. To execute the script createRulesTable.db2, connect to the DWESAMP database
and run db2 -tvf createRulesTable.db2 from that directory.
The MARTS.STORETYPES table is added to the DWESAMP database.
Here is a summary of the changes that need to be done in this lesson:
inputForm.jsp page
If you click New Task on the Welcome page of the application, you go to
the inputForm page where you can enter the values for the input
parameters that are defined in the flows. All input fields contain default
values. However, this page design is not easy-to-use, because the user
needs to know which years to apply and the type of stores. To improve the
DB2 Warehouse Tutorial, Version 9.5

93

usability of this page, you add list fields, which automatically contain the
values that are available.

Figure 27. The customized inputForm.jsp page

processTask.jsp page
The processTask page is displayed when you click Run on the
inputForm.jsp page. You only have to match the changes made in the
inputForm page.
To customize and use the application, complete the following steps:
1. Open the inputForm.jsp page with a text editor and make the following
changes.
a. Between <DIV class="mbbody"> and <FORM>, add the following code snippet:
<blox:data id="storeTypeDataBlox"
query="SELECT STORETYPE FROM MARTS.STORETYPES"
dataSourceName="DWESAMP"/>
<iminer:memberSelectRDB
id="inputParams_storeType"
valueColumn="STORETYPE"
minimumWidth="200" visible="false"
multiple="true" size="4"
dataBloxRef="storeTypeDataBlox"/>
<iminer:select
id="inputParams_year" size="1"
minimumWidth="200" visible="false">
<bloxform:option label="2002" value="2002" selected="true"/>
<bloxform:option label="2003" value="2003"/>
<bloxform:option label="2004" value="2004"/>
</iminer:select>

This code snippet adds two selection blox: one for the store type, which
uses an SQL query to find the store types; and one for the year, for which
values are manually entered.
b. Replace the input form area (from <FORM>to</FORM>) ) with the following
lines:
<FORM action="processTask.jsp" name="ParameterForm" method="post">
<table width="615" border="0" cellpadding="6" cellspacing="2">
<tr>
<td valign="top"><strong>Storetype:</strong></td>
<td><blox:display bloxRef="inputParams_storeType"/></td>
</tr>
<tr>
<td valign="top"><strong>Year:</strong></td>
<td><blox:display bloxRef="inputParams_year"/></td>

94

</tr>
<tr>
<td valign="top"><strong>Task Name:</strong></td>
<td><INPUT name="taskName" value="Sample Task" type="text" size="30"
maxlength="50"></INPUT></td>
</tr>
<tr>
<td valign="top"></td>
<td><input type="submit" name="run" value="Run"></input></td>
</tr>
</table>
</FORM>

This code snippet adds two display blox that refer to the selection blox.
Save your changes.
2. Open the processTask.jsp page in a text editor:
a. Under // Get parameters from the input form, replace the following lines:
String taskVis = "application";
String inputParams_storeType = request.getParameter("inputParams_storeType");

with the following lines:


String taskVis = "application";
String[] inputParams_storeTypes = request.getParameterValues("inputParams_storeType");
String inputParams_storeType = "";
if (inputParams_storeTypes != null){
inputParams_storeType = "(";
for (int i = 0; i < inputParams_storeTypes.length; i++)
{
if (i != 0)
{
inputParams_storeType += ",";
}
inputParams_storeType += "" + inputParams_storeTypes[i] + "";
}
inputParams_storeType += ")";
}

Note: To avoid cut-and-paste errors, make sure that the text you copy from
the PDF format only contains straight single quotes (such as all) instead of
apostrophes.
3. Save your changes and open your applications main page again to see the
changes.

Lesson checkpoint
You have customized the applications JSP pages to create a user interface that is
easy-to-use.
You learned how to:
v Use JSP tags to customize the user interface of the application.

Module 8: Combining text analysis and OLAP


The IT manager of JK Superstore wants to determine if the skills of his employees
are still up-to-date. To find out the current trends in the IT job market, the
manager decides to analyze the job offerings posted on the Internet by the
following eight major IT companies: Sigratech, Mics System, JTS World, Netrum,
Tersa, Vernacle, Quantech, and Coratech. The manager wants to know which skills
(such as programming languages) are requested by these companies and which
skills have an increasing trend.
However, none of the existing reporting tools can handle this unstructured
text-form data. Therefore, the companys aim is to obtain a structured form of the
DB2 Warehouse Tutorial, Version 9.5

95

data, which can be analyzed. To achieve this goal, you can use the Dictionary
Lookup operator. This operator uses the user-defined dictionary that contains a list
of IT skills to create annotations. It finds each occurrence of terms that have
previously been defined in the dictionary and marks their positions. You obtain a
table, which shows the terms (skills) found in the dictionary for each job offer.
From this table you first design a star schema, and then build an OLAP cube.
Finally you use this cube in an Alphablox-based report to show the results of the
analysis.
This module consists of the following lessons:
v Understanding the data used in this module
v Using the Text Analysis tools
v Building a star schema
v Defining an OLAP model for the star schema and deploying it to Cubing
Services
v Creating an Alphablox report

Learning objectives
After completing the lessons in this module you understand the concepts and
know how to:
v Use the text analysis tools for analyzing unstructured data
v Use and create a star schema
v Define and deploy a cube model
v Use the Blox Builder function to build an Alphablox report

Time required
This module takes approximately 90 minutes to complete.

Optional: Start the tutorial here

If you have already completed the previous modules, skip this section and
continue with Lesson 1. If you have not completed the previous modules, but want
to start here, you must complete the following steps.
To start the tutorial here:
Open the DB2 Command window and run the following script to create the
appropriate version of the DWESAMP database:
Windows

C:\Program Files\IBM\dwe\samples\data\setupolapandmining.bat
Linux

/opt/IBM/dwe/samples/data/setupolapandmining.sh
For more information on this script refer to the Readme.txt or readme_linux.txt file
in the data directory. For general information on the tutorial setup, refer to the
following sections:
v Running the tutorial in a Windows client-server environment on page 7

96

v Running the tutorial in a Linux client-server environment on page 8


v Running the tutorial in a mixed client-server environment on page 8
You are ready to begin.

Lesson 1: Understanding the data used in this module


In this lesson, you explore the data used in this module and create your data
warehouse project.
For this tutorial, you must use the DWESAMP database. To connect to the
DWESAMP database:
1. In the Database Explorer, expand the Connections folder to view the existing
databases.
2. Right-click the DWESAMP database and click Reconnect.
3. When prompted, type your DB2 user name and password and click OK.
In the DWESAMP database, you find the JOBS table in the schema TXTANL. This
table is your starting point. It contains the following columns:
COMPANY_NAME
The name of the company.
TIME The date when the offer was published.
ID

A number that identifies the offer.

JOB_DESC
The text-form of the job description.
To explore the JOBS table and create your project:
1. To see the sample contents of this table:
a. In the Design Studio, expand the tree in the Database Explorer and
navigate to database DWESAMP, schema TXTANL, table JOBS.
b. Right-click the JOBS table and select Distribution and statistics Data
Exploration.
In the Sample Content tab you can view the sample data. On the bottom of this
view, you can see the full content of the text column JOB_DESC.
2. To create a new project and the mining flow, select File New Data
Warehouse Project. In the creation wizard, name your project Unstructured
Data Tutorial and click Finish.
Note: The Design Studio provides a sample project that already contains most
of the resources (dictionaries, taxonomies, flows) created in this module. If you
want to save some time, create the Text Analysis Sample project as follows:
Select File New Examples Data Warehousing Examples Text Analysis
Sample.
You can now start to analyze the unstructured data in the JOB_DESC column of
the JOBS table.

Lesson checkpoint
You are more familiar with the data that is processed in this module. You have
viewed the JOBS table and created your data warehouse project.
After completing this lesson, you learned how to:
DB2 Warehouse Tutorial, Version 9.5

97

v Obtain sample contents of a table and browse its text columns


v Create a new data warehouse project

Lesson 2: Using the Text Analysis tools


In this lesson you design a dictionary that describes a range of IT skills and you
define a mining flow that uses the Dictionary Lookup operator to extract the skill
keywords from the job offerings.
The Dictionary Lookup operator uses a dictionary to process text. This dictionary
is a list of words or multi-word expressions that you define. You can specify
several variants for each term, which are then recognized as the same reference,
such as Java and its variant J2EE. You learn how to design a dictionary that
describes a range of IT skills.
You also have to build a taxonomy, which is a hierarchical classification that
consists of multiple levels. To build a taxonomy you use the Taxonomy editor to
construct a classification for the skills and export it to a dimension table. This table
is used in the next lesson to build a star schema. The taxonomy to be built contains
the terms from the it_skills dictionary. It has the following structure:

Figure 28. The skill taxonomy built from the dictionary

If you are using the Text Analysis Sample project, it already contains the it_skills
dictionary, the Skills taxonomy taxonomy, and the Jobs Dictionary Analysis
flow described in this lesson.
To define the Text Analysis tools and construct the flow:
1. Create the it_skills dictionary:
a. In the Data Project Explorer view, expand the project tree: Unstructured
Data Tutorial Text Analysis Dictionaries
b. Right-click Dictionaries and choose New Dictionary

98

Figure 29. Text Analysis Tree in the Data Project Explorer

The Dictionary Name wizard opens.


c. In the Dictionary Name field, type it_skills and click Finish. The
Dictionary editor opens:
v The left side of the Dictionary editor displays the dictionary entries. It
shows each term (also called base form) and a sample of its variants.
You can import, add, and remove entries here.

DB2 Warehouse Tutorial, Version 9.5

99

Figure 30. Dictionary Entries for the base forms and their variants (left part of the Dictionary
editor)

v In the Entry details section on the right side of the Dictionary editor,
you can edit a dictionary entry or create new entries. Here you can add
or change the variants of the entry.

100

Figure 31. Entry details of the dictionary (right part of the Dictionary editor)

v The Inflections section in the lower part on the right side of the
Dictionary editor shows the inflections of an entry, which are
automatically detected during lookup. For example, if you enter the
singular form of database in the dictionary, the plural form databases
is also found automatically. The inflections are shown for the chosen
language.
v Select English because the texts in the JOBS table are in English.
Note: For the selected language you need not enter case-sensitive
variants for common words, which are also referred to as in-vocabulary
words. Terms are marked as in-vocabulary or out-of-vocabulary in the
inflections table. For example, if you enter database into your
dictionary, the operator automatically detects Database or DataBase.
However, for acronyms or special terms, which are marked as
out-of-vocabulary, the detection depends on the case in which the word
is entered into the dictionary. It is recommended that you enter the
words in lowercase. For example if you enter j2ee, the dictionary finds
J2EE; if you enter the term in uppercase, the lowercase occurrence is
not detected.
2. Define the dictionary. Enter the following list of terms and their variants. The
first word of a list item is the base form of the entry and the following words
are the variants of this entry.
v C# , c#, C #, c #
v C/C++, C, C++, c++, c ++, C ++
DB2 Warehouse Tutorial, Version 9.5

101

v
v
v
v
v
v
v

Database skills, Database, RDBMS, DB


DB2, DB/2, db/2, IBM DB2, IBM db2, db2
Java, J2EE, j2ee, JSP, Java Server Pages
JavaScript, Javascript, javascript
Mac OS, MAC OS, MAC Os, Mac Os
MS SQL Server, Microsoft SQL Server
MySQL, MYSQL, MySql

Network, TCP/IP, TCP, IP, DNS


Oracle, oracle
Others OS, Solaris
Perl
PL/SQL, PL, SQL, Sql, sql
Python
Script, scripting languages, scripting, bash, ch, Ch, csh, sh, shell, tcsh
Unix/Linux, Unix, Linux, Debian, FreeBSD, GNU, gnu, GNU/Linux,
Madriva, RedHat,AIX
v Visual Basic, VB, VisualBasic
v Web Services, SOA, WSDL, CORBA, SOAP
v Web skills, Ajax, ajax, ASP, asp, html, HTML, php, PHP, XML, XSLT
v Windows
3. Create a taxonomy:
a. Create a new taxonomy with the Taxonomy editor. Select Unstructured
Data Tutorial Text Analysis New Taxonomy.
b. In the Create Taxonomy wizard, name your taxonomy Skill taxonomy and
click Next.
c. Select Create taxonomy from import source, check Dictionary, and click
Next.
d. Browse in your project tree, select the it_skills dictionary and click Finish.
The Taxonomy editor opens. On the left part of the window you see the
Unassigned terms section. It contains the terms that have been imported
from the dictionary. On the right part of the window you find the
Taxonomy tree section, where you build the taxonomy.
4. Define a taxonomy:
a. Create the first level. On the Taxonomy tree, right-click on Root, and select
Add Level to add a new member for the first level. To add and rename
five levels click on each level and type the new name: Database,
Development/Programming languages, Operating Systems, Web oriented
skills, and Others. To change the name of a level in a later step, you can
right-click on the level and select Edit.
b. Assign the unassigned terms to the categories that you have created. Place
your cursor on the first category and use the darts buttons in the middle
of the window to assign terms to this category. For example, place your
cursor on the Web oriented skills category, and add unassigned terms
Web skills and Web Services. Do the same for the other unassigned
terms shown in Figure 28 on page 98.
c. Save the taxonomy.
5. Export the taxonomy into a table:
a. In the Data Project Explorer expand the Text Analysis Taxonomies
folder of your data warehouse project and select the Skill taxonomy.
v
v
v
v
v
v
v
v

102

b.
c.
d.
e.

Right-click and select Export....


Press Next on the Available taxonomies page.
On the Select connection page select the DWESAMP database.
On the Required table information page specify:
v Table Name: SKILL_TAXONOMY
v Table Schema: TXTANL

and press Next.


f. Press Next on the Table Details page.
g. The Show Table Format page summarizes the taxonomy table that is
created. This table is used in the next lesson to build a dimension table for
the skill dimension.
h. Press Finish to store the table.
6. Create a mining flow:
a. On the Data Project Explorer, right-click on your project and select New
Mining Flow.
b. In the wizard, name your flow Jobs Dictionary Analysis, and click Next.
c. Choose Use an existing connection, select the DWESAMP database in the
list below and click Finish.
7. Define the source table of the mining flow:
a. In the mining flow editor, select the Table Source operator and place it on
the canvas.
b. Select the TXTANL.JOBS table of the DWESAMP database.
8. Define the Dictionary Lookup operator:
a. Select the Dictionary Lookup operator from the operator palette and place
it on the canvas.
b. Connect the output of the Table Source JOBS to the input of the DLO.
c. On the Properties view, on the Dictionary Settings tab, select JOB_DESC as
the input text column and English as the language.
d. On the Analysis Results tab, choose it_skills as dictionary and as
annotation type. For the results columns select baseform , coveredText,
and id. Rename these columns respectively SKILL_CAT , SKILL_DETAILS, and
SKILL_ID.
e. On the Output Column tab, choose COMPANY_NAME, TIME, and ID.
9. Define a table target for the result table:
a. Drag a Table Target operator on the canvas.
b. In the Select Database Table dialog, expand the schema TXTANL, select the
IT_SKILLS_ASKED table, and click Finish.
c. On the Properties view, check the Delete previous content box.
10. Connect the Dictionary Lookup operator with the Table Target operator:
a. Connect the output port of the Dictionary Lookup operator with the input
port of the Table Target operator.
b. In the Select Column Connections dialog, select the Connect by name
radio button and click OK.
The final flow should look like this:

DB2 Warehouse Tutorial, Version 9.5

103

Figure 32. Dictionary Analysis Flow

11. Run the flow:


a. Click the Job Dictionary Analysis Flow tab and select Mining Flow
Execute on the menu.
b. Accept the default option on the wizard and click Execute.
c. Check the Data Output view to see if there are any problems.
You can now view the result table. Right-click the IT_SKILLS_ASKED target table
in the flow editor and select Sample Contents of Database Table. The Data
Output view displays some sample content of the result table.

Figure 33. Sample contents of the Dictionary Analysis result table

The obtained table links informative data about the offers (which company, when,
which offer) with the required analysis data (the IT skills requested in these offers).
You can also explore relationships among TIME, COMPANY, and SKILLS using the
Multivariate Analysis:
1. Right-click the IT_SKILLS_ASKED target table in the flow editor and select
Distribution and statistics Multivariate.
2. In the Input Data Selection dialog, accept the default values and click OK.
3. In the Multivariate Distribution viewer, hold the Ctrl key and select the
columns TIME, COMPANY_NAME, and SKILLS_CAT in the table.

104

You can now use this table for reporting, for example, to:
v Find out which skills are required on the market, and get an idea of the
development activities of the other companies.
v Study the trend for a defined period of time.
v Identify which companies share the same interests.
In the next lesson you are going to build an example with a cube model.

Lesson checkpoint
You obtained a table with the skills requested by each offer. This table is the start
of an OLAP analysis.
You learned how to:
v Define a dictionary
v Define a taxonomy and export it as a table
v Define the parameters of the Dictionary Lookup operator

Lesson 3: Building a star schema


In this lesson you create a data mart from the text analysis results.
The data mart is implemented as a relational star schema that represents a
multidimensional cube. You use data flows in the Design Studio to populate the
star schema from the text analysis results table IT_SKILLS_ASKED:

Figure 34. Star schema for the jobs dictionary analysis

A star schema is a simple database schema, which is optimized for OLAP queries.
It consists of a fact table and different dimension tables:
v The fact table represents measured values. It is at the center of the schema.
v The dimension tables represent the data description. These tables form the star
branches.
DB2 Warehouse Tutorial, Version 9.5

105

The fact table (the center) contains one row for each skill mentioned for each job
offering. Column NB_SKILLS is a weight assigned to the fact row. Each job
offering should count only once, that is why the NB_SKILLS value is computed as
1 / number of skills for each offering. If a job offering mentions three skills, each
skill fact row gets a weight of 1/3. This allocation scheme is called uniform
allocation. The other columns in the fact table are foreign keys, which refer to the
three dimension tables. The ID column allows to link the fact row back to the job
offering that contained this skill reference.
The SKILL and the TIME dimension tables contain several columns that define
hierarchies in the cube. You can use these hierarchies to drill down from summary
to detail level.
Create the text analysis sample project and explore the star schema diagram. The
Design Studio provides a sample project that contains most of the resources
(physical database model, data flows, control flows, and cube model) used in this
module:
1. In the Design Studio, select File New Example Data Warehousing
Examples Text Analysis Sample.
2. Click Next, accept the default values, then click Finish.
3. In the Data Project Explorer you can now expand the TextAnalysisSample
project.
4. In the Data Models folder, double-click the StarSchemaModel database model.
5. Expand the OLAPANL schema and open the OLAPANL diagram to explore the
physical tables defined for the star schema.
Explore and execute the transformation flows to populate the star schema tables.
The sample project contains several data flows referenced from the control flow
PopulateStarSchemaForJobs.
CleanupStarSchema
Delete all rows from the target tables.
FillCompanyDimensionTable
Create a company dimension entry for each distinct company in
IT_SKILLS_ASKED.
FillTimeDimensionTable
Create a time dimension entry with levels for year, quarter, month, and
day for each distinct time value in IT_SKILLS_ASKED.
FillSkillDimensionTable
Fill the skill dimension using the SKILL_TAXONOMY table prepared in
the previous lesson.
PopulateFactTable
Create a fact row for each row in IT_SKILLS_ASKED, check the
dimension-table foreign keys in the dimension tables, and compute the
NB_SKILLS weight for each fact row.
The dimension table flows use a DB2 sequence object to generate unique IDs,
which are used as primary keys for the dimension tables.
To explore a data flow:
1. Expand the Data Flows subfolder of your TextAnalysisSample project and
double-click a data flow.

106

2. In the flow editor select an operator to explore the operators properties in the
Property view.
In the FillSkillDimensionTable data flow, the SKILL_TAXONOMY table created in
the preceding lesson is joined with the distinct values for column SKILL_DETAILS
from the IT_SKILLS_ASKED table. This enables you to drill down in the skill
dimension of the cube, not only to the SKILL_CAT/TERM_NAME level, but also
to the individual occurrences/variants of the skills mentioned in the text. If you
are just interested in SKILL_CAT level analysis, you can directly use the
SKILL_TAXONOMY table as a dimension table.

Figure 35. The FillSkillDimensionTable data flow

The PopulateFactTable data flow creates a fact row for each row in
IT_SKILLS_ASKED, looks up the dimension table foreign keys in the dimension
tables (using a join) and computes the NB_SKILLS weight for each fact row as
follows: The group by operator counts the number of rows in IT_SKILLS_ASKED
for each ID value (for each job offering). The select list property of the Table Join
operator computes the NB_SKILLS weight using the following expression:
DOUBLE(1) / DOUBLE(IN4_07.SKILLS_PER_OFFER)

IN4_07.SKILLS_PER_OFFER refers to the count computed by the group by


operator.

Figure 36. PopulateFactTable data flow

DB2 Warehouse Tutorial, Version 9.5

107

The control flow PopulateStarSchemaForJobs in the sample data warehouse project


simply executes the five data flows in the order shown above. To execute the
control flow and the data flows:
1. Open control flow PopulateStarSchemaForJobs in the sample of the data
warehouse project.
2. From the menu choose Control Flow > Execute.
3. In the Flow Execution wizard click Execute.
4. When the execution is finished, validate that this run was successful.

Lesson checkpoint
You explored and populated the star schema for your text analysis results.
You learned:
v What a star schema is
v How to create data flows that transform your text analysis results into a
multidimensional data mart

Lesson 4: Defining an OLAP model for the star schema and


deploying it to Cubing Services
In this lesson you define and deploy a cube model that describes your relational
star schema to the DWESAMP database and start an OLAP cube on the Cubing
Services server.
DB2 Warehouse Cubing Services can accept multidimensional OLAP queries from
Alphablox and other analytical tools. OLAP metadata objects define the role of the
relational tables in a multidimensional cube model for example, which table
contains the fact rows and which tables contain the cube dimensions and
hierarchies.
The TextAnalysisSample project contains a data model where the OLAP metadata
objects associated with your relational star schema are already defined.
The cube model STAR_FACT_TABLE in schema OLAPANL defines the following
OLAP objects:
v Facts STAR_FACT_TABLE with measure NB_SKILLS
v Dimension STAR_SKILL with levels for skill group, skill category, and skill
details
v Dimension STAR_COMPANY
v Dimension STAR_TIME with levels year, quarter, month, and day
v Cube Jobs Analysis cube using all dimensions from the cube model
To explore these OLAP objects:
1. In the Data Project Explorer, expand the TextAnalysisSample Data Models
StarSchemaModel DWESAMP OLAPANL OLAP Objects folder.
2. Select the object you want to explore and see the properties of the object in the
Properties view.
To deploy the cube model from your design environment to the database you need
a cubing services repository. You can verify the repository settings under Window
Preferences Data Warehousing Repository. Refer to Module 4: Designing
OLAP metadata on page 43 for details.

108

To validate and deploy the cube model and the Jobs Analysis cube to the
DWESAMP database and the cubing services repository:
1. In the Data Project Explorer, right-click the OLAPANL schema and select
Analyze Model.
2. Accept the default parameters and click Finish. In the Console view, check
your errors and correct them.
3. When there are no errors left, you can deploy your metadata to the database:
right-click the STAR_FACT_TABLE cube model in the Data Project Explorer,
and select Deploy to database. The Deploy OLAP Objects dialog opens.
4. Select DWESAMP as the target database and click Finish.

Figure 37. The OLAP metadata objects in model StarSchemaModel

The OLAP metadata objects are now available in the metadata repository. To
invoke multidimensional queries you must assign the cube to a running cube
server and start the cube. You have learned how to define and start a cube server
in Module 4: Designing OLAP metadata on page 43. In this lesson it is assumed
that the cube server is up and running.
To define and start the Jobs Analysis cube on a cube server using the
Administration Console:
1. From the Administration Console, select Cubing Services Manage Cube
Servers. A list of cube servers is displayed.
2. Verify that a cube server, for example DWEREPOS, is started - if not, start the
server.
3. Select the started cube servers link from the list of cube server names. The
cube server Properties page opens.
4. Define the database mapping for the Job Analysis cube.
v Select Cubing Services Manage OLAP Metadata
v Click START_FACT_TABLE
DB2 Warehouse Tutorial, Version 9.5

109

v Select Database mapping


v Click Save
5. Add the Jobs Analysis cube to the cube server.
a. Click the Cubes tab.
b. Click Add.
c. Select the Jobs Analysis cube from the list and click Add.
6. Select the check box to the left of the Jobs Analysis cube and click the Start
button.
The Jobs Analysis cube that you created is ready for testing and developing
analytic applications.

Lesson checkpoint
You explored the cube model for the star schema and deployed the cube model
and cube to the database and a cubing service.
You learned how to:
v Define and modify OLAP metadata
v Deploy OLAP metadata to a database
v Assign and start a cube on a cube server

Lesson 5: Creating an Alphablox report


In this lesson you create an Alphablox cross-tab report by using the Design Studio
Blox Builder tool.
This report allows you to access your text analysis results, such as the number of
skills requested in job offerings, from a Web browser and drill down from
summary view to details along the defined hierarchies in the dimensions time and
skill.

Figure 38. The Alphablox report for the Jobs Analysis cube

Alphablox needs data source definitions to access the data in the cubes you
defined. In Module 5: Creating Alphablox reports based on IBM cubes on page
58 you have created Alphablox data sources to access your multidimensional cubes
from Alphablox reports. In this lesson it is assumed that Alphablox has already
been configured to access your defined cubes.

110

If you cannot access the defined cubes, define a Cubing Services Adapter data
source on the Alphablox Administrative pages:
1. Log into the DB2 Alphablox Administrative Pages as a user with
administrator rights. From a Web browser, you must define:
http://serverLocation:9080/AlphabloxAdmin/home.
2. Click the Administration tab, click the Data Sources link, and then click
Create.
3. Enter the following information:
v Data Source Name: CSADAPTER
v Adapter: IBM Cubing Services Adapter
v Cubing Services Server Name: host name of the machine where you started
the cube server
v Port: port name of your cube server as defined in the Administration
Console
v Userid and password
Note: To test your reports the Blox Builder needs a valid Alphablox server
configuration. You can verify the Blox Builder settings in the Design Studio under
Window Preferences Blox Builder Alphablox Server Configuration. For
details, refer to Module 5: Creating Alphablox reports based on IBM cubes on
page 58.
To
1.
2.
3.

create a new Blox Builder project:


Select File New Project Blox Builder Blox Builder Project.
Specify project name: Unstructured tutorial reports.
Click Finish when asked to switch to the Blox Builder perspective, click Yes.

To create a new report in the project:


1. Expand the Unstructured tutorial reports project in the Data Project
Explorer.
2. Right-click the Reports folder and select New Report.
3. In the New Report Wizard specify
v Name: JobsAnalysis
v Display Name: Jobs Analysis
v Description: Cross-tab report for number of jobs by skill over time
4. Click Finish.
To define the report model:
1. Double-click the JobsAnalysis report to open it in the report editor.
2. In the report editor select the Model tab. Here you define the logical elements
that contribute to your report. Drag the following components from the palette
to the model canvas:
a. From the Visual Components category a Text component
b. From the Blox category a DataBlox and a PresentBlox component
3. Connect the DataBlox component to the PresentBlox component through their
DataBlox ports.
4. Specify the properties of the Text component.
a. Select the Text component
b. In the Properties view, specify the following properties:
DB2 Warehouse Tutorial, Version 9.5

111

Text

Show skills requested in job offerings over time

Height
50
Width 500
5. Specify the properties of the DataBlox component. This component defines the
data source to be used in the report and the query to be executed.
a. Select the DataBlox component.
b. In the Properties view specify:
v On the Data Source tab, specify Data Datasource: CSADAPTER (use the
same cubing services adapter name that you specified in the data sources
section in the Alphablox administrative pages)
v On the Data Query tab, specify the following MDX Query:
SELECT
DISTINCT( {{[Jobs Analysis cube].[STAR_TIME].[All years],
AddCalculatedMembers([Jobs Analysis cube].[STAR_TIME].[All years].Children)}} )
ON AXIS(0)
, DISTINCT( {{[Jobs Analysis cube].[STAR_SKILL].[All Skills],
AddCalculatedMembers([Jobs Analysis cube].[STAR_SKILL].[All Skills].Children)}} )
ON AXIS(1)
FROM [Jobs Analysis cube]
WHERE
(
[Jobs Analysis cube].[Measures].[NB_SKILLS (STAR_FACT_TABLE)],
[Jobs Analysis cube].[STAR_COMPANY].[All companies]
)

Note: If you are not familiar with the MDX query language, you can
easily create your own MDX expressions in the Alphablox query builder.
6. Specify the properties of the PresentBlox component. This component defines
the visual appearance of the data and contains a tabular grid showing the data
cells and a chart.
a. Select the PresentBlox component.
b. In the Properties view specify:
v On the Grid Data Display tab, the DefaultCellFormat: ###,##0
v On the Present Property tab, the DataLayoutAvailable,
DataLayoutVisible, PageAvailable, and PageVisible: false
v On the Chart Property tab, the O1AxisTitle: Year
v On the Chart Property tab, the Y1AxisTitle: Number Job Offerings
v

112

Figure 39. The report model in Blox Builder report editor

To define the report layout:


1. In the report editor, select the Layout tab. Here you define the visual layout of
the following elements that contribute to your report:
The Text component
Position the Text component in the upper part of your canvas, then
resize the component.
The PresentBlox component
Position the PresentBlox component below the Text component, then
resize the component.
2. Press Ctrl-S to save your report.
To preview your report and run it on the Alphablox server:
1. In the report editor, select the Overview tab.
2. Select the Preview this report on the default server link. Your report is now
deployed in preview mode and displayed in the Preview Report viewer. You
can work with your report in the viewer, for example, double-click a skill
group (Database) to drill down and expand the skill categories in this group.
If you started the tutorial from this module, right-click the report in the Blox
Builder Project Explorer and select Preview Report. Then configure the server
by following the instructions in Module 5: Creating Alphablox reports based
on IBM cubes on page 58.
3. Modify your report model and layout until satisfied and repeat steps 1 to 2.
You can create additional reports on your cube and deploy them on the Alphablox
server as a new Alphablox application. Then the text analysis results can be
accessed by all authorized users from a Web browser.

DB2 Warehouse Tutorial, Version 9.5

113

For details on the deployment of Alphablox applications and reports, refer to


Module 5: Creating Alphablox reports based on IBM cubes on page 58.

Lesson checkpoint
You created a new Alphablox report to visualize your text analysis results.
You learned how to:
v Define a Cubing Services Adapter in Alphablox
v Create a report using the Design Studio Blox Builder tool
v Define and test the report model and layout

Summary
You have completed the design and deployment of a business intelligence solution
that expands the capabilities of a DB2 data warehouse for the JK Superstore retail
stores.

Lessons learned
By completing this tutorial, you learned how to:
v Design and update a physical data model for a warehouse database
v Design applications for building warehouse and mart tables by using SQL-based
data flows and control flows
v Deploy and run warehouse building applications in the Administration Console
v Design a complete cube model and deploy performance-enhancing MQTs
v Design a Blox Builder report that is based on OLAP metadata
v Design a mining model that analyzes purchasing trends
v Create a mining web application with Miningblox
v Analyze unstructured data by using text analysis tools, an OLAP cube, and an
Alphablox report

Glossary
General terms
Administration Console
DB2 Warehouse Web client, which provides a browser-based
administration environment for data warehouse applications. The console
is hosted by WebSphere Application Server and recognizes data sources
and system resources that are already defined in the WebSphere
environment. You can use the console to deploy, schedule, and monitor
jobs that load data and run mining analyses. You can also start, stop, and
maintain cube servers and cubes.
BI Perspective
Default perspective in the Design Studio. A perspective defines the initial
set and layout of views and provides a set of functions for accomplishing a
specific type of task, or working with a specific type of resource. The BI
perspective includes functions that are tailored for building information
warehouses and enabling warehouse-based analytics such as OLAP and
data mining.
Data Project Explorer
A tree view that shows the files and metadata associated with your data

114

warehouse projects. You can create and manage data warehouse projects
and objects within a project such as mining flows, data flows, control
flows, and warehouse applications.
Database Explorer
A tree view where you create new database connections and connect to
existing databases; explore database schemas; invoke data exploration
functions such as sample content, value distributions (univariate, bivariate,
multivariate); and explore data mining models.
Design Studio
An integrated graphical development environment for designing various
components of a data warehouse application, including physical data
models, SQL-based data flows and mining flows, control flows, and OLAP
cube models.
editor A visual component that you typically use to edit or browse a resource.
Modifications that you make in an editor are usually not automatically
saved. You must explicitly save your changes. Multiple instances of an
editor can exist within a project.
project
Design Studio containers where you build the objects in your data
warehouse application. Depending on the type of project that you are
working in, you can create different types of objects, such as physical data
models and OLAP objects in a data design project or data flows, control
flows, and mining flows in a data warehouse project.
Properties views
Tabbed pages that allow you to specify the detailed behavior of each
operator in a data flow. Within these pages, for example, you can define
which tables or files your source and target operators represent, and how
each operator will change the data set.
Text analytics
The automatic extraction of structured information from unstructured
textual documents. This structured information can be stored in relational
database tables and used for further analysis by reporting or advanced
analytical tools. Typical subtasks of information extraction are named entity
recognition, terminology extraction, entity resolution, and relationship
extraction. Named entity recognition finds names for people or
organizations in a text, and entity resolution finds out which names,
nouns, or pronouns refer to the same entity.
view (in Design Studio)
A visual component that you typically use to navigate a hierarchy of
information, open an editor, or display properties for the active editor.
Modifications that you make in a view are saved immediately. Normally,
only one instance of a particular type of view can exist within the Design
Studio.
WebSphere Application Server
DB2 Warehouse runtime environment that hosts the Administration
Console and provides the infrastructure for deploying, scheduling, and
monitoring data warehouse applications.
WebSphere DataStage
An enterprise ETL system that is part of the IBM WebSphere Data
Integration Suite.

DB2 Warehouse Tutorial, Version 9.5

115

SQL Warehousing Tool


The SQL Warehousing Tool (SQW) is the DB2 Warehouse component that you use
to design, deploy, and run data warehouse applications. These applications
perform warehouse building operations, using a DB2 database as the execution
engine for extracting, loading, and transforming data.
control flow
Graphical model that sequences data flows and mining flows, integrates
external commands, programs, and stored procedures, and provides
conditional processing logic for a data warehouse application.
data flow
Graphical model that defines activities that extract data from flat files or
relational tables, transform the data, and load it into a data warehouse,
data mart, or staging table
data station
Data flow operator that represents a staging point in a flow where you can
inspect the intermediate result set. The data station can use a persistent
table, a temporary table, a file, or a view for data storage.
deployment preparation
The process of preparing a data warehouse application for deployment to
the WebSphere environment where the application can be scheduled and
managed. Deployment preparation takes place in the Design Studio and
results in a zip file that you can deploy in the Administration Console.
ETL and ELT
Extract, transform, and load (ETL) systems, such as WebSphere DataStage,
do the work of extracting data from various operational systems,
transforming it to meet the needs of business applications, and loading it
into target databases, such as data warehouses and data marts.
Extract, load, and transform (ELT) systems, such as SQW, extract data from
source tables and files, load it directly into a relational database, then use
the database engine to run data transformations.
key lookup
Data flow operator that matches key values from intermediate result sets
(such as incoming fact table rows) with valid keys that already exist in
referenced tables, such as warehouse dimension tables. Key lookups are
similar to joins, and their properties consist of conditions that define the
matching criteria between the incoming data and one or more lookup
tables.
operator
A logical step in a data flow, mining flow, or control flow. For example, a
file import operator defines the extraction of data from a flat file at the
beginning of a data flow or mining flow.
physical data model
A metadata model, stored as a .dbm file, that represents the tables and
other objects in a database. You select objects from physical data models
when you are designing flows.
SQL Condition Builder or Expression Builder
Graphical interface for defining SQL conditions and expressions that are
required as properties for various operators. For example, you can define
the conditions for a table join or aggregate expressions for a group by
operator.

116

Cubing Services
calculated member
Data members that include dynamically generated data derived from
calculations performed against members that exist in your result set.
cube

An organized data structure with more than two dimensions that is


defined by a set of dimensions and measures.

cube server
A high performance, scalable cubing engine that is designed to support
queries from many users against many different OLAP cubes. The Cubing
Services Cube Server is designed to enable fast multidimensional access to
relational data that is referenced by the OLAP cubes defined in the Cubing
Services metadata database.
dimension
A data category, such as time, accounts, products, or markets. In a
multidimensional database outline, the dimensions represent the highest
consolidation level.
hierarchy
A defined relationship among a set of attributes that are grouped by levels
in the dimension of a cube model. These relationships between levels are
usually defined with a functional dependency to improve optimization.
Multiple hierarchies can be defined for a dimension of a cube model.
level

A set of attributes that work together as one logical step in a hierarchys


ordering. A level contains one or more attributes that are related and can
function in one or more roles in the level. The relationship between the
attributes in a level is usually defined with a functional dependency.

materialized query table (MQT)


Pre-aggregated data that improves performance and response time for
complex queries, particularly those associated with analyzing large
amounts of data.
measure
A measurement entity used in facts objects. You can calculate measures
with an SQL expression or map them directly to a numerical value in a
column. An aggregation function summarizes the value of the measure for
dimensional analysis.
member
A discrete component within a dimension that represents a data
occurrence. For example, January 2007 or 1Qtr2007 are typical members of
a time dimension.
online analytical processing (OLAP)
A multidimensional, multi-user, client-server computing environment for
users who need to analyze and model consolidated enterprise data in real
time.
Optimization Advisor
Wizard that guides you through the steps of optimizing a cube model in
the Database Explorer. By optimizing for queries that are performed on a
cube model, you can improve the performance of products that issue
OLAP-style SQL queries.

DB2 Warehouse Tutorial, Version 9.5

117

Mining
associations function
A mining algorithm that finds associations, patterns, or rules hidden in
data.
confidence
Anticipated range of an output variable given a set of input variable
values.
lift

Accuracy of predictive data models.

market basket analysis


Mining procedure that determines what products customers purchase
together. A retail organization can use this information to organize
products.
mining model
Output of a mining application from historical data to predict new data or
to denote existing patterns.
taxonomy
The science of classification according to an algorithm.

Alphablox
Application
A Blox Builder application contain sets of reports that you view in the
browser. The analytic application that you create in Blox Builder is
different from the Alphablox and J2EE applications.
Blox Builder
The Eclipse-based Alphablox tool that can be used by report developers
and Java developers to create reports based on data retrieved from
relational or multidimensional databases.
Blox component
An Alphablox software component, using Web and Java technologies, to
build analytic applications. A Blox component contains a Blox object, such
as a DataBlox or a PresentBlox.
Components
Components are the parts that make up a report. Components define the
data and visual aspects of a report. For example, a report can contain a
PresentBlox component, which displays a grid and a chart, and a checkbox
component, which displays a checkbox.
Layout
A reports layout defines how the reports components appear in the
browser. The layout defines each components size and position when the
report is displayed in the browser. Like the report display name and
description, a report can have a different layout for each locale.
Page member
The member of a dimension that is being displayed; a filter that restricts
the data to the specified member.
Properties
You can create custom report properties, which you use to share
information between the reports components. For example, your report
contains a PresentBlox component and a checkbox component. You want
the PresentBlox to display a toolbar when you check the checkbox. You can

118

design the report so that when you click the checkbox, the checkbox
component sets a report property to true. The PresentBlox accesses the
value of the report property to determine whether to display a toolbar.
Property Expression wizard
Adds a property reference expression to the query string by using the
Property Expression wizard.
Property Reference wizard
Adds a property reference to the query string by using the Property
Reference wizard.
Query A query contains the query string, data, and connection information such
as the name of the data source and the query. A query definition defines
the querys unique ID, display name, and description. A query can then be
accessed from any report, and you can override the properties in a query
in the report.
Query Designer
Generates a query from a visual layout. Query Designer is a version of
Query Builder. The Query Designer displays a PresentBlox with the data.
You can modify what data is displayed in the PresentBlox, and Query
Designer will replace the current query text with the generated query from
the displayed data.
Report
You can create, preview, and deploy reports. After you create your report,
you can create an analytic application that contains the report. A report
may contain multiple visual and non-visual components. A DataBlox is a
non-visual component, whereas a PresentBlox, text, and buttons are visual
components. You can create a simple report containing only a DataBlox
and PresentBlox.
Report catalog
A Blox Builder application displays a navigation tree of available reports
that you can choose to display in the report viewer. The report catalog
defines which reports appear in the navigation tree. In a report catalog,
you can override a reports display name, description, and the values of
the report properties. For example, you create a report that displays the
sales data for a certain time period. A custom report property contains the
value that determines the time period. In an application, you can set the
report property to different values to display reports for different time
periods.
Report link
Each time a report is added to an applications navigation, you can
override these properties to create different reports (called a report link)
from a single report template. You can add links to reports to your Blox
Builder application and specify the reports name and description that will
appear in the application. Your application must be open in the
Application Editor.
Test Query
Displays the query on the default server.
Traffic lighting
Highlighting of data cells based on specified criteria, typically a range of
values. Named after the frequent use of red, yellow, and green colors to
highlight the status of a displayed value.

DB2 Warehouse Tutorial, Version 9.5

119

120

Notices
IBM may not offer the products, services, or features discussed in this document in
all countries. Consult your local IBM representative for information on the
products and services currently available in your area. Any reference to an IBM
product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product,
program, or service that does not infringe any IBM intellectual property right may
be used instead. However, it is the users responsibility to evaluate and verify the
operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter
described in this document. The furnishing of this document does not give you
any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
For license inquiries regarding double-byte (DBCS) information, contact the IBM
Intellectual Property Department in your country/region or send inquiries, in
writing, to:
IBM World Trade Asia Corporation
Licensing
2-31 Roppongi 3-chome, Minato-ku
Tokyo 106, Japan
The following paragraph does not apply to the United Kingdom or any other
country/region where such provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS
PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS
FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or
implied warranties in certain transactions; therefore, this statement may not apply
to you.
This information could include technical inaccuracies or typographical errors.
Changes are periodically made to the information herein; these changes will be
incorporated in new editions of the publication. IBM may make improvements
and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for
convenience only and do not in any manner serve as an endorsement of those Web
sites. The materials at those Web sites are not part of the materials for this IBM
product, and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it
believes appropriate without incurring any obligation to you.
Copyright IBM Corp. 2007

121

Licensees of this program who wish to have information about it for the purpose
of enabling: (i) the exchange of information between independently created
programs and other programs (including this one) and (ii) the mutual use of the
information that has been exchanged, should contact:
Such information may be available, subject to appropriate terms and conditions,
including in some cases payment of a fee.
The licensed program described in this document and all licensed material
available for it are provided by IBM under terms of the IBM Customer Agreement,
IBM International Program License Agreement, or any equivalent agreement
between us.
Any performance data contained herein was determined in a controlled
environment. Therefore, the results obtained in other operating environments may
vary significantly. Some measurements may have been made on development-level
systems, and there is no guarantee that these measurements will be the same on
generally available systems. Furthermore, some measurements may have been
estimated through extrapolation. Actual results may vary. Users of this document
should verify the applicable data for their specific environment.
Information concerning non-IBM products was obtained from the suppliers of
those products, their published announcements, or other publicly available sources.
IBM has not tested those products and cannot confirm the accuracy of
performance, compatibility, or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed to the
suppliers of those products.
All statements regarding IBMs future direction or intent are subject to change or
withdrawal without notice, and represent goals and objectives only.
This information may contain examples of data and reports used in daily business
operations. To illustrate them as completely as possible, the examples include the
names of individuals, companies, brands, and products. All of these names are
fictitious, and any similarity to the names and addresses used by an actual
business enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information may contain sample application programs, in source language,
which illustrate programming techniques on various operating platforms. You may
copy, modify, and distribute these sample programs in any form without payment
to IBM for the purposes of developing, using, marketing, or distributing
application programs conforming to the application programming interface for the
operating platform for which the sample programs are written. These examples
have not been thoroughly tested under all conditions. IBM, therefore, cannot
guarantee or imply reliability, serviceability, or function of these programs.
Each copy or any portion of these sample programs or any derivative work must
include a copyright notice as follows:
Copyright IBM Corp. 2004, 2005. All rights reserved.

122

Trademarks
The following terms are trademarks of International Business Machines
Corporation in the United States, other countries, or both, and have been used in
at least one of the documents in the DB2 UDB documentation library.
The following terms are trademarks of International Business Machines
Corporation in the United States, other countries, or both:
AIX
DB2
DB2 Connect
DB2 Universal Database
IBM
Office Connect
Redbooks

The following terms are trademarks or registered trademarks of other companies


and have been used in at least one of the documents in the DB2 UDB
documentation library:
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of
Microsoft Corporation in the United States, other countries, or both.
Intel and Pentium are trademarks of Intel Corporation in the United States, other
countries, or both.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Other company, product, or service names may be trademarks or service marks of
others.
Linux is a registered trademark of Linus Torvalds. Red Hat and all Red Hat-based
trademarks and logos are trademarks or registered trademarks of Red Hat, Inc. in
the United States and other countries.
Other company, product, or service names may be trademarks or service marks of
others.

Notices

123

124

Contacting IBM
If you have a technical problem, please review and carry out the actions suggested
by the product documentation before contacting DB2 Data Warehouse Edition
Customer Support. This guide suggests information that you can gather to help
DB2 Data Warehouse Edition Customer Support to serve you better.
For information or to order any of the DB2 Data Warehouse Edition products,
contact an IBM representative at a local branch office or contact any authorized
IBM software remarketer.
If you live in the U.S.A., you can call one of the following numbers:
v 1-800-IBM-SERV (1-800-426-7378) for customer service
v 1-888-426-4343 to learn about available service options
v 1-800-IBM-4YOU (426-4968) for DB2 marketing and sales
Note: In some countries, IBM-authorized dealers should contact their dealer
support structure instead of the IBM Support Center.

Product Information
Information regarding DB2 Data Warehouse Edition is available by telephone or by
the World Wide Web at http://www.ibm.com/software/data/db2/dwe.
This site contains the latest information on the technical library, ordering books,
product downloads, newsgroups, FixPaks, news, and links to Web resources.
If you live in the U.S.A., then you can call one of the following numbers:
v 1-800-IBM-CALL (1-800-426-2255) to order products or to obtain general
information.
v 1-800-879-2755 to order publications.
http://www.ibm.com/software/data/db2/udb/dwe/
Provides links to information about DB2 Data Warehouse Edition.
http://www.ibm.com/software/data/db2/9
The DB2 Web pages provide current information about news, product
descriptions, education schedules, and more.
http://www.elink.ibmlink.ibm.com/
Click Publications to open the International Publications ordering Web site
that provides information about how to order books.
http://www.ibm.com/education/certify/
The Professional Certification Program from the IBM Web site provides
certification test information for a variety of IBM products.

Accessible documentation
Documentation is provided in XHTML format, which is viewable in most Web
browsers.

Copyright IBM Corp. 2007

125

XHTML allows you to view documentation according to the display preferences


that you set in your browser. It also allows you to use screen readers and other
assistive technologies.
Syntax diagrams are provided in dotted decimal format. This format is available
only if you are accessing the online documentation using a screen reader.

Comments on the documentation


Your feedback helps IBM to provide quality information. Please send any
comments that you have about this book or other DB2 Data Warehouse Edition
documentation. You can use any of the following methods to provide comments:
v Send your comments using the online readers comment form at
www.ibm.com/software/data/rcf.
v Send your comments by electronic mail (e-mail) to comments@us.ibm.com. Be
sure to include the name of the product, the version number of the product, and
the name and part number of the book (if applicable). If you are commenting on
specific text, please include the location of the text (for example, a title, a table
number, or a page number).
v Send your comments by mail to:
International Business Machines Corporation
Reader Comments DTX/E269
555 Bailey Avenue
San Jose, CA
U. S. A. 95141-9989
For information on how to contact IBM outside of the United States, go to the IBM
Worldwide page at www.ibm.com/planetwide.

126



Program Number: 5724-E34

Printed in USA

SC18-9801-04

Das könnte Ihnen auch gefallen