Sie sind auf Seite 1von 12

Data Warehousing and Business Intelligence

DATA WAREHOUSING AND ORACLE - A BEGINNERS


GUIDE
Andrew Flower, Dataspace Incorporated

WHERE TO BEGIN
This paper is a beginners guide to the decisions that need to be made in the lifecycle of a
data warehouse project and how Oracle products can be part of each technology
component of the warehouse. Business Intelligence and Data Warehousing, to me, are
synonymous. Others my separate these two terms into back end processing (data
warehousing) and front end deployment (business intelligence). At a minimum they are
complimentary. Business intelligence can only be achieved with well-founded data stores
(read: data warehouses) and appropriately chosen and crafted user access tools.
So where do you begin your data warehouse, or business intelligence project? This is
the least technical question you will face. You must start with a Vision of what your data
warehouse solution is to achieve and have a plan, at least at a high level, for rolling out
iterative solutions. The rest of this paper will take you through the lifecycle of a data
warehouse project, identifying where the critical technology choices need to be made
and demonstrate how Oracle products can fill each technical need.
VISION
If you are starting at square one, the first thing you must come to is an understanding of
the business processes the data warehouse enables. Some examples are managerial
decision-making, product planning, market expansion, etc. Yes, decision-making is a
business process and a warehouse can be used to enable decision-making. There are
three main deliverables from a vision phase: an entity level enterprise data model, a map
of which functional areas need which entities, and a long term, incremental plan for
delivering subject areas in the enterprise data model.
Spend some time with each functional group within your business and ask them what are
their top 3 to 5 decisions. Typically this is enough to get them to start talking about
their business intelligence needs. The people in your business generally have a very
good idea of what they need to make decisions and many times can tell you exactly
where the information lives in your organizations systems. You want them to talk about
source systems and subject matter but try to resist the urge to get into the details that
should come out in a detailed requirements session. Collect as much information as you
can, even if it is not about data warehousing. Remember, these people have important
needs that are not getting addressed. If they identify a project that is outside the scope
of the data warehouse, document it as such as part of your final report as an important
project that is not part of your long term work plan but should not be ignored.
Once you identify the high level subject areas in the enterprise data model, the mapping
of functional groups to entities, and the work plan that breaks the model into subject
area releases you are ready to begin the nitty gritty of data warehousing. All you need
now is approval from the business to start. Once that is obtained you take the first
project on the list and move into Discovery to collect the detailed requirements from the
relevant functional groups.

Paper #100
Data Warehousing and Business Intelligence

Technology Decision The only technology decision you need to make in the Vision
phase of your warehouse project is which data-modeling tool to use. Many of you
already have Oracle Designer for designing your database applications. For data
warehousing you may want to look to Oracle Warehouse Builder. Whilst Designer is a
more comprehensive design tool, OWB will give you the ability to focus on data
warehouse design and development. If you already have Designer in house it may be the
perfect tool for doing the Vision phase and capturing the information identified. If you
do not have Designer and your focus is solely data warehouse development OWB may be
the product for you.
If you are not starting at square one, that is to say, you already have a Vision for the data
warehouse and may even have some subject area solutions in place, then you are
probably looking for ways to be more productive in your development or for better
performance for user queries. You may want to skip to the Architecture section to help
you map your architectural pain to the solutions provided by Oracle products.
Architecture is where we get physical and make decisions about development tools,
physical database design, tools for operations and maintenance, and user access to their
information.
DISCOVERY
Now that you have selected your first subject are you can start digging deeper and
gathering the detailed requirements for that subject area. Since you are talking with one
or more functional groups focus your design on the data descriptions they are giving you
and assimilate or resolve diverging definitions of data elements across groups. I dont
want to trivialize this since sometimes getting all the functional groups to agree on
definitions can be difficult.
The design of the warehouse is based on the users definitions and not the definitions in
the source systems. Basing the data warehouse design on the source systems binds you
to those source systems. This is most important with dimension data such as customers
or products. The way your sales system and billing systems, particularly if they are
packages from separate vendors, can describe customers in different ways with different
descriptors, such as the customers status.
Although the design of the warehouse should not be based on the source systems make
sure you document where they believe the data resides. Using the data elements from
the requirements and the list of possible sources as a starting point interview the source
system owners to obtain a map of files and data elements that are needed to populate the
warehouse.

Paper #100
Data Warehousing and Business Intelligence

Figure 1 - Time Dimension in OWB

Once you obtain the detail data requirements from the respective functional groups, use
your chosen modeling tool to create your logical data model. If you are embarking on a
star schema data warehouse project be sure to model your hierarchies in the logical
model. Figure 1 (taken from Oracle Warehouse Builder: A Technical Overview An
Oracle Technical White Paper) is an example of designing the time dimension and its
relevant hierarchies.
Other items to be documented are the users expectations of data availability and refresh
cycle. Also document the source systems batch cycles and when the data will be
available for extraction. These two items may not always agree. Make sure any
differences are communicated to the user so that they have realistic expectations.
ARCHITECTURE
This is where the rubber meets the road. Most of the technical decisions are made
during the architecture phase. The decisions range from the physical design of the
database complete with proper size estimates, a complete ETL architecture, complete
users interaction architecture, and operational and maintenance design.

Paper #100
Data Warehousing and Business Intelligence

DATABASE ARCHITECTURE
The physical database design takes the logical model and makes it real. This step
usually includes decisions about indexes, partitions, security roles, dimensions,
aggregates to be built, and sizing. This paper is not intended to be a DBA focused paper
so a lot of the details of implementing the physical database are omitted. The concepts
are summarized and should give a good starting point for your DBA.
INDEXES
In the modeling tool we want to make sure that we have primary and foreign key indexes
created for most tables. Some small dimension tables dont always need indexes.
Additional indexes should be built on those columns that are commonly used in queries.
Bit map indexes can be employed on those cardinality indexes.
PARTITIONS
Oracle introduced key range partitioning in 8.0. Range partitioning allows you to break
up your large tables on a key value. Base level fact tables are the most commonly
partitioned tables and they are most often partitioned by a date attribute such as day,
week, month or year.
DIMENSIONS AND AGGREGATES
Defining dimensions and aggregates are necessary components of Summary
Management. Summary Management is a collections of features built into Oracle 8i
that help to maintain aggregate data automatically. Additionally, and perhaps more
powerful, Summary Management will rewrite user queries to take advantage of pre-
computed aggregates. This allows the users to focus on asking the right questions and
not on which aggregate it is in.
Aggregates are maintained in materialized views, which are effectively tables based on a
SQL select statements. These materialized views can be refreshed after changes are
made to the base fact table in either COMPLETE or FAST mode. COMPLETE refresh will
completely rebuild the materialized view. The FAST refresh will incrementally update
the aggregate.
In order for query rewrite to work dimensions must be defined based on the tables that
relate to the base fact table and its materialized views. Dimension tables are created in
the same way other tables are with a CREATE TABLE statement. There is a CREATE
DIMENSION statement that tells Oracle what tables are part of dimension and it fits in
the dimension hierarchy. These dimension definitions are critical to determining how to
rewrite the query, as the aggregates will be based on the dimensions tables and their
hierarchies.
BUILDING THE WAREHOUSE
Extraction Transformation and Load (ETL) processing is in many ways the engine the
drives the data warehouse. Building and infrastructure that supports the entire
warehouse and business intelligence solution is not easy. You are dependent on the
source systems operations, structure, and accessibility. You are bound by the business
rules that determine how to convert that source data into your organizations
information. Fortunately for you, there is Oracle Warehouse Builder (OWB).
OWB as discussed above can be used as a data warehouse design tool. In addition to
this it can be used as a source system analysis tool. Using the wizards you can import

Paper #100
Data Warehousing and Business Intelligence

metadata from Oracle sources, Non-Oracle sources, and flat file sources. Figure 2 shows
the import wizard for an Oracle Source.

Figure 2 - Source Table Wizard

The next process is to build the mappings between the source systems to our physical
data warehouse, which we defined in the Discovery and Database Architecture sections
above. The mappings are designed at a high level first showing the flow of data from the
source system to the target warehouse as seen in Figure 3. OWB uses the term modules
to describe the source systems and the target warehouse. In the mappings a table or set
of tables are selected in the source module and are mapped to tables in the target
module.

Paper #100
Data Warehousing and Business Intelligence

Figure 3 - High Level Mapping

Paper #100
Data Warehousing and Business Intelligence

Figure 4 - Detail Mapping

At the detail mapping level each target column is mapped from a source column or from
a transformation function that could have one or more inputs from the source system.
Figure 4 shows a sample low-level mapping.
All this mapping effort translates into PL/SQL code that is generated based on the
mapping diagrams and the metadata. The PL/SQL for the mapping in Figures 3 and 4 is
in Figure 5.

Paper #100
Data Warehousing and Business Intelligence

Figure 5 - Generated PL/SQL

OWB has the ability to build dependencies into the process model. As a general rule,
dimension tables are loaded before the fact tables. Using Oracle Workflow the mappings
can be executed in the order necessary as demonstrated in Figure 6.

Paper #100
Data Warehousing and Business Intelligence

Figure 6 - Oracle Work Flow

DATA MINING
Many people confuse manual analysis, slice and dice, drill down, etc. with data mining.
That process is simply analysis and reporting. Data mining is more of a database
process that analyzes the data in the warehouse in search of patterns in the data that
may not be found in a manual search. Once patterns are known they can be applied in a
predictive manner to help encourage or prevent events from happening. A typical use
for the patterns is to cross sell products to customers that fit a certain profile or pattern.
Also, the patterns can be used in early fraud detection to stop fraudulent actions before
they in fact become fraud.
Oracle Darwin is a data-mining product. Darwin is designed to plow through massive
amounts of data in your data warehouse to discover those hidden patterns using a
number of different modeling techniques. Figure 7 shows a screen shot of resulting
profiles and segments from a Darwin model.

Paper #100
Data Warehousing and Business Intelligence

Figure 7 - Sample Profiles and Segments

ANALYSIS AND REPORTING


Oracle has three products for the end user to consume information from the data
warehouse. Oracle Reports can be used to generate standard reports that developers
can format into a number of professional quality reports. Oracle Reports is best suited
for the scheduled reports that are published to the user community for direct
consumption and not interactive analysis.
Oracle Discoverer is an ad hoc analysis tool for users to interact with their data to
discover more information than is readily available in a standard report. Discoverer
provides the end user the ability to ask questions and when an answer is received, ask a
more detailed follow-up question. Analysis can be done in tabular or graphical reports
that are equally interactive. The one limitation of Discoverer may be its reliance on SQL
to a lot of the analytical processing. Figure 8 shows a sample Discoverer analysis.
However, on the up side, Discoverer can take full advantage of the high performance
query structures provided for by summary management with the materialized views and
query re-write without much work.
Oracle Express in many ways is a competitor of Discoverer. Both products target the
analysts and their need to drill into information to find out more. One the one hand
Discoverer is positioned to take advantage of all the new analysis features being added
to the Oracle database for high performance queries. On the other hand, Express is a
much more mature analysis environment with more powerful built analytical capabilities.
In time most organizations will not need both. In fact many organizations could get by

Paper #100
Data Warehousing and Business Intelligence

with just Discoverer. Yet some need both since they have varying level analyst skills and
needs. Whether you choose to use Express or Discoverer or both depends on a number
of factors.
Do you have the need for the type of analysis in Express that is beyond Discoverer?
Do you have the skill set in house to support both products?

Do you have the budget to support the extra hardware needed to support the Express
application and data above and beyond what is allocated for the warehouse structure
in Oracle?
This paper doesnt have the space to get into the details of the analysis necessary to
choose Express or Discoverer. I recommend that you get a strong understanding of your
needs before signing any checks.

Figure 8 - Sample Discoverer Analysis

METADATA
Oracle Warehouse Builder is based on the Common Warehouse Model, which will allow it
to exchange metadata with Oracle Express and Oracle Discoverer and any business
intelligence tool that complies with CWM. CWM defines a model and a software
development kit (SDK) that can be used to build exchanges between other vendor
products. CWM uses Java for the programming language, XML for the interchange and
UML for the modeling language.
No metadata repository will provide 100% of the metadata requirements you or your
customers will come up with. CWM does provide a solid basis to build from.
CONCLUSIONS
Not only does Oracle offer one of the best database platforms available for data
warehousing and business intelligence, Oracle offers a full compliment of products to
support your development and end user needs. Oracle Warehouse Builder provides a
platform that manages the backend of the warehouse from design to implementation,
from transformation processing to metadata exchange. In addition to OWB on the back
end, Oracle has Darwin to perform the heavy data mining activities to find patterns and
help predict future behavior customers and other trends.
Oracle 8i provides a number of built in features that simplifies the management of data
and enhances the performance of user queries. Partitioning the large tables, particularly

Paper #100
Data Warehousing and Business Intelligence

fact tables makes the management of the data much more simple. Summary
management not only simplifies the maintenance of aggregates with materialized views
it also improves query response time by providing query rewrite capabilities so that user
queries pull the data from the most appropriate aggregate.
For reporting and analysis, the three choices of Oracle Reports, Oracle Discoverer, and
Oracle Express, provides the end user with a number of options for performing analysis.
Reports can be used to build static reports that can be executed on demand or on a
regular schedule. Both Discoverer and Express provide analytical capabilities, which
can be redundant. Express is more mature and more robust than Discoverer but with
more features, like Summary Management being added to Oracle relational OLAP tools
like Discoverer are gaining ground.
Ultimately the data warehousing and business intelligence products you use should be
well thought out, purchased, and used within the context of your overall data warehouse
vision. For every choice you make Oracle has a product that might fit the bill.

Paper #100

Das könnte Ihnen auch gefallen