Sie sind auf Seite 1von 82

OLTP & OLAP

OLTP vs. OLAP


OLTP: On Line Transaction Processing
Describes processing at operational sites
OLAP: On Line Analytical Processing
Describes processing at warehouse
OLTP vs. OLAP
Mostly updates
Many small transactions
Mb-Gb of data
Raw data
Clerical users
Up-to-date data
Consistency, recoverability
critical
Mostly reads
Queries long, complex
Gb-Tb of data
Summarized, consolidated
data
Decision-makers, analysts
as users
OLTP
OLAP
INTRODUCTION
Online transaction processing (OLTP) the
gathering of input information, processing
that information, and updating existing
information to reflect the gathered and
processed information
Databases support OLTP
Operational database databases that support
OLTP
INTRODUCTION
Online analytical processing (OLAP) the
manipulation of information to support
decision making
Databases can support some OLAP
Data warehouses only support OLAP, not OLTP
Data warehouses are special forms of databases
that support decision making
INTRODUCTION
OLTP vs. OLAP
On-line Transaction Processing (OLTP)
Day-to-day handling of transactions that result
from enterprise operation
Maintains correspondence between database
state and enterprise state
On-line Analytic Processing (OLAP)
Analysis of information in a database for the
purpose of making management decisions
OLAP
Analyzes historical data (terabytes) using complex
queries
Due to volume of data and complexity of queries,
OLAP often uses a data warehouse
Data Warehouse - (offline) repository of historical
data generated from OLTP or other sources
Data Mining - use of warehouse data to discover
relationships that might influence enterprise strategy
Examples - Supermarket
OLTP
Event is 3 cans of soup and 1 box of crackers
bought; update database to reflect that event
OLAP
Last winter in all stores in northeast, how many
customers bought soup and crackers together?
Data Mining
Are there any interesting combinations of
foods that customers frequently bought
together?
OLTP: On Line Transaction Processing
Describes processing at operational sites
OLAP: On Line Analytical Processing
Describes processing at warehouse
OLTP vs. OLAP
Warehouse is a Specialized DB
Standard DB (OLTP)
Mostly updates
Many small transactions
Mb - Gb of data
Current snapshot
Index/hash on p.k.
Raw data
Thousands of users (e.g.,
clerical users)
Warehouse (OLAP)
Mostly reads
Queries are long and
complex
Gb - Tb of data
History
Lots of scans
Summarized, reconciled
data
Hundreds of users (e.g.,
decision-makers, analysts)
Decision Support
Information technology to help the
knowledge worker (executive, manager,
analyst) make faster & better decisions
What were the sales volumes by region and product category for
the last year?
How did the share price of comp. manufacturers correlate with
quarterly profits over the past 10 years?
Which orders should we fill to maximize revenues?
On-line analytical processing (OLAP) is an
element of decision support systems (DSS)
Three-Tier Decision Support Systems
Warehouse database server
Almost always a relational DBMS, rarely flat files
OLAP servers
Relational OLAP (ROLAP): extended relational DBMS that maps
operations on multidimensional data to standard relational
operators
Multidimensional OLAP (MOLAP): special-purpose server that
directly implements multidimensional data and operations
Clients
Query and reporting tools
Analysis tools
Data mining tools
The Complete Decision Support
System
Information Sources Data Warehouse
Server
(Tier 1)
OLAP Servers
(Tier 2)
Clients
(Tier 3)
Operational
DBs
Semistructured
Sources
extract
transform
load
refresh
etc.
Data Marts
Data
Warehouse
e.g., MOLAP
e.g., ROLAP
serve
Analysis
Query/Reporting
Data Mining
serve
serve
Data Warehouse vs. Data Marts
Enterprise warehouse: collects all information about
subjects (customers,products,sales,assets,
personnel) that span the entire organization
Requires extensive business modeling (may take years to design
and build)
Data Marts: Departmental subsets that focus on selected
subjects
Marketing data mart: customer, product, sales
Faster roll out, but complex integration in the long run
Virtual warehouse: views over operational dbs
Materialize sel. summary views for efficient query processing
Easy to build but require excess capability on operat. db servers
OLAP for Decision Support
OLAP = Online Analytical Processing
Support (almost) ad-hoc querying for business analyst
Think in terms of spreadsheets
View sales data by geography, time, or product
Extend spreadsheet analysis model to work with
warehouse data
Large data sets
Semantically enriched to understand business terms
Combine interactive queries with reporting functions
Multidimensional view of data is the foundation of
OLAP
Data model, operations, etc.
Approaches to OLAP Servers
Relational DBMS as Warehouse Servers
Two possibilities for OLAP servers
(1) Relational OLAP (ROLAP)
Relational and specialized relational DBMS to
store and manage warehouse data
OLAP middleware to support missing pieces
(2) Multidimensional OLAP (MOLAP)
Array-based storage structures
Direct access to array data structures

OLAP Server: Query Engine
Requirements
Aggregates (maintenance and querying)
Decide what to precompute and when
Query language to support multidimensional
operations
Standard SQL falls short
Scalable query processing
Data intensive and data selective queries
Data Warehousing and End-User
Access Tools
Accompanying growth in data warehouses
is increasing demands for more powerful
access tools providing advanced analytical
capabilities.

Key developments include:
Online analytical processing (OLAP).
SQL extensions for complex data analysis.
Data mining tools.

Introducing OLAP
The dynamic synthesis, analysis, and
consolidation of large volumes of multi-
dimensional data, Codd (1993).

Describes a technology that uses a multi-
dimensional view of aggregate data to
provide quick access to strategic
information for purposes of advanced
analysis.
Introducing OLAP
Enables users to gain a deeper
understanding and knowledge about
various aspects of their corporate data
through fast, consistent, interactive access
to a wide variety of possible views of the
data.

Allows users to view corporate data in
such a way that it is a better model of the
true dimensionality of the enterprise.
Introducing OLAP
Can easily answer who? and what?
questions, however, ability to answer what
if? and why? type questions distinguishes
OLAP from general-purpose query tools.

Types of analysis ranges from basic
navigation and browsing (slicing and dicing)
to calculations, to more complex analyses
such as time series and complex modeling.
OLAP Benchmarks
OLAP Council published an analytical
processing benchmark referred to as the
APB-1 (OLAP Council, 1998).

Aim is to measure a servers overall OLAP
performance rather than the performance
of individual tasks.
OLAP Benchmarks
APB-1 assesses most common business
operations including:
bulk loading of data from internal/external data
sources;
incremental loading of data from operational systems;
aggregation of input/level data along hierarchies;
calculation of new data based on business models;
time series analysis;
queries with a high degree of complexity;
drill-down through hierarchies;
ad hoc queries;
multiple online sessions.
OLAP Benchmarks
OLAP applications are judged on their
ability to provide just-in-time (JIT)
information, a core requirement of
supporting effective decision-making.

Assessing a servers ability to satisfy this
requirement is more than measuring
processing performance but includes its
abilities to model complex business
relationships and to respond to changing
business requirements.
OLAP Benchmarks
APB-1 uses a standard benchmark metric
called AQM (Analytical Queries per Minute).

AQM represents number of analytical queries
processed per minute including data loading
and computation time. Thus, AQM
incorporates data loading performance,
calculation performance, and query
performance into a singe metric.
OLAP Benchmarks
Publication of APB-1 benchmark results
must include both the database schema
and all code required for executing the
benchmark.

An essential requirement of all OLAP
applications is ability to provide users with
JIT information, to make effective
decisions about an organizations strategic
directions.
OLAP Applications
JIT information is computed data that
usually reflects complex relationships and
is often calculated on the fly.
Also, as data relationships may not be
known in advance, the data model must
be flexible.
Examples of OLAP Applications
in Various Functional Areas

OLAP Applications
Although OLAP applications are found
in widely divergent functional areas, all
have following key features:
multi-dimensional views of data;
support for complex calculations;
time intelligence.
OLAP Applications - Multi-
Dimensional Views of Data

Core requirement of building a realistic
business model.

Provides basis for analytical processing
through flexible access to corporate data.

The underlying database design that
provides the multi-dimensional view of
data should treat all dimensions equally.
OLAP Applications - Support for
Complex Calculations
Must provide a range of powerful
computational methods such as that
required by sales forecasting, which uses
trend algorithms such as moving
averages and percentage growth.

Mechanisms for implementing
computational methods should be clear
and non-procedural.

OLAP Applications Time
Intelligence
Key feature of almost any analytical
application as performance is almost
always judged over time.

Time hierarchy is not always used in
same manner as other hierarchies.

Concepts such as year-to-date and
period-over-period comparisons should
be easily defined.

OLAP Benefits
Increased productivity of end-users.
Reduced backlog of applications
development for IT staff.
Retention of organizational control over the
integrity of corporate data.
Reduced query drag and network traffic on
OLTP systems or on the data warehouse.
Improved potential revenue and
profitability.
Representing Multi-Dimensional
Data
Example of two-dimensional query.
What is the total revenue generated by property
sales in each city, in each quarter of 1997?

Choice of representation is based on types
of queries end-user may ask.

Compare representation - three-field
relational table versus two-dimensional
matrix.

Multi-Dimensional Data as Three-Field Table
versus Two-Dimensional Matrix
Representing Multi-Dimensional
Data
Example of three-dimensional query.
What is the total revenue generated by property
sales for each type of property (Flat or House) in each
city, in each quarter of 1997?

Compare representation - four-field
relational table versus three-dimensional
cube.
Multi-Dimensional Data as Four-
Field Table versus Three-
Dimensional Cube
Representing Multi-Dimensional
Data
Cube represents data as cells in an array.

Relational table only represents multi-
dimensional data in two dimensions.
Multi-Dimensional OLAP Servers
Use multi-dimensional structures to store
data and relationships between data.

Multi-dimensional structures are best
visualized as cubes of data, and cubes
within cubes of data. Each side of cube is a
dimension.

A cube can be expanded to include other
dimensions.
Multi-Dimensional OLAP
Servers
A cube supports matrix arithmetic.

Multi-dimensional query response time
depends on how many cells have to be
added on the fly.

As number of dimensions increases,
number of the cubes cells increases
exponentially.
Multi-Dimensional OLAP
Servers
However, majority of multi-dimensional
queries use summarized, high-level data.

Solution is to pre-aggregate (consolidate)
all logical subtotals and totals along all
dimensions.

Pre-aggregation is valuable, as typical
dimensions are hierarchical in nature.
(e.g. Time dimension hierarchy - years, quarters,
months, weeks, and days)
Multi-Dimensional OLAP Servers
Predefined hierarchy allows logical pre-
aggregation and, conversely, allows for a
logical drill-down.

Supports common analytical operations
Consolidation.
Drill-down.
Slicing and dicing.

Multi-Dimensional OLAP
Servers
Consolidation - aggregation of data such
as simple roll-ups or complex expressions
involving inter-related data.

Drill-Down - is reverse of consolidation
and involves displaying the detailed data
that comprises the consolidated data.

Slicing and Dicing - (also called pivoting)
refers to the ability to look at the data
from different viewpoints.

Multi-Dimensional OLAP
servers
Can store data in a compressed form by
dynamically selecting physical storage
organizations and compression techniques
that maximize space utilization.

Dense data (i.e., data that exists for high
percentage of cells) can be stored
separately from sparse data (i.e.,
significant percentage of cells are empty).

Multi-Dimensional OLAP
Servers
Ability to omit empty or repetitive cells
can greatly reduce the size of the cube and
the amount of processing.

Allows analysis of exceptionally large
amounts of data.

Multi-Dimensional OLAP
Servers
In summary, pre-aggregation, dimensional
hierarchy, and sparse data management
can significantly reduce the size of the
cube and the need to calculate values on-
the-fly.

Removes need for multi-table joins and
provides quick and direct access to arrays
of data, thus significantly speeding up
execution of multi-dimensional queries.

Codds Rules for OLAP Systems
In 1993, E.F. Codd formulated twelve rules
as the basis for selecting OLAP tools.
Multi-dimensional conceptual view
Transparency
Accessibility
Consistent reporting performance
Client-server architecture
Generic dimensionality

Codds Rules for OLAP
Dynamic sparse matrix handling
Multi-user support
Unrestricted cross-dimensional operations
Intuitive data manipulation
Flexible reporting
Unlimited dimensions and aggregation levels.
Codds Rules for OLAP Systems
There are proposals to re-define or
extend the rules. For example, to also
include:
Comprehensive database management tools.
Ability to drill down to detail (source record) level.
Incremental database refresh.
SQL interface to the existing enterprise
environment.
Categories of OLAP Tools
OLAP tools are categorized according to
the architecture of the underlying
database.

Three main categories of OLAP tools
include
Multi-dimensional OLAP (MOLAP or MD-OLAP)
Relational OLAP (ROLAP), also called multi-relational
OLAP
Managed query environment (MQE)

Multi-Dimensional OLAP
(MOLAP)
Uses specialized data structures and multi-
dimensional Database Management
Systems (MDDBMSs) to organize,
navigate, and analyze data.

Data is typically aggregated and stored
according to predicted usage to enhance
query performance.
Multi-Dimensional OLAP
(MOLAP)
Use array technology and efficient storage
techniques that minimize the disk space
requirements through sparse data
management.

Provides excellent performance when data
is used as designed, and the focus is on
data for a specific decision-support
application.
Multi-Dimensional OLAP
(MOLAP)
Traditionally, require a tight coupling with
the application layer and presentation
layer.

Recent trends segregate the OLAP from
the data structures through the use of
published application programming
interfaces (APIs).

Typical Architecture for
MOLAP Tools

MOLAP Tools - Development
Issues
Underlying data structures are limited in
their ability to support multiple subject
areas and to provide access to detailed
data.

Navigation and analysis of data is limited
because the data is designed according to
previously determined requirements.

MOLAP Tools - Development
Issues
MOLAP products require a different set of
skills and tools to build and maintain the
database, thus increasing the cost and
complexity of support.

Relational OLAP (ROLAP)
Fastest growing style of OLAP technology.

Supports RDBMS products using a
metadata layer - avoids need to create a
static multi-dimensional data structure -
facilitates the creation of multiple multi-
dimensional views of the two-dimensional
relation.
Relational OLAP (ROLAP)
To improve performance, some products
use SQL engines to support complexity of
multi-dimensional analysis, while others
recommend, or require, the use of highly
denormalized database designs such as
the star schema.

Typical Architecture for ROLAP
Tools

ROLAP Tools - Development
Issues
Middleware to facilitate the development
of multi-dimensional applications.
(Software that converts the two-
dimensional relation into a multi-
dimensional structure).

Development of an option to create
persistent, multi-dimensional structures
with facilities to assist in the
administration of these structures.
Hybrid OLAP (HOLAP)
Can use data from either a RDBMS directly or
a multi-dimension server.

Managed Query Environment
(MQE)
Relatively new development.

Provide limited analysis capability, either
directly against RDBMS products, or by
using an intermediate MOLAP server.
Managed Query Environment
(MQE)
Deliver selected data directly from DBMS
or via a MOLAP server to desktop (or local
server) in form of a datacube, where it is
stored, analyzed, and maintained locally.

Promoted as being relatively simple to
install and administer with reduced cost
and maintenance.

Typical Architecture for MQE
Tools

MQE Tools - Development
Issues
Architecture results in significant data
redundancy and may cause problems for
networks that support many users.

Ability of each user to build a custom
datacube may cause a lack of data
consistency among users.

Only a limited amount of data can be
efficiently maintained.
OLAP Extensions to SQL
SQL promoted as easy to learn, non-
procedural, free-format, DBMS-
independent, and international standard.

However, major disadvantage has been
inability to represent many of the
questions most commonly asked by
business analysts.

IBM and Oracle jointly proposed OLAP
extensions to SQL early in 1999, adopted as
an amendment to SQL.
OLAP Extensions to SQL
Many database vendors including IBM,
Oracle, Informix, and Red Brick Systems
have already implemented portions of
specifications in their DBMSs.

Red Brick Systems was first to implement
many essential OLAP functions (as Red
Brick Intelligent SQL (RISQL)), albeit in
advance of the standard.
OLAP Extensions to SQL - RISQL
Designed for business analysts.

Set of extensions that augments SQL with
a variety of powerful operations
appropriate to data analysis and decision-
support applications such as ranking,
moving averages, comparisons, market
share, this year versus last year.
Approaches to OLAP Servers
Three possibilities for OLAP servers
(1) Relational OLAP (ROLAP)
Relational and specialized relational DBMS to store and
manage warehouse data
OLAP middleware to support missing pieces
(2) Multidimensional OLAP (MOLAP)
Array-based storage structures
Direct access to array data structures
(3) Hybrid OLAP (HOLAP)
Storing detailed data in RDBMS
Storing aggregated data in MDBMS
User access via MOLAP tools


Points to be noticed about ROLAP
Defines complex, multi-dimensional data with simple
model
Reduces the number of joins a query has to process
Allows the data warehouse to evolve with rel. low
maintenance
Can contain both detailed and summarized data.
ROLAP is based on familiar, proven, and already selected
technologies.
BUT!!!
SQL for multi-dimensional manipulation of calculations.
MOLAP: Dimensional Modeling Using
the Multi Dimensional Model
MDDB: a special-purpose data model
Facts stored in multi-dimensional arrays
Dimensions used to index array
Sometimes on top of relational DB
Products
Pilot, Arbor Essbase, Gentia
The MOLAP Cube
sale prodId storeId amt
p1 s1 12
p2 s1 11
p1 s3 50
p2 s2 8
s1 s2 s3
p1 12 50
p2 11 8
Fact table view:
Multi-dimensional cube:
dimensions = 2
3-D Cube
dimensions = 3
Multi-dimensional cube: Fact table view:
sale prodId storeId date amt
p1 s1 1 12
p2 s1 1 11
p1 s3 1 50
p2 s2 1 8
p1 s1 2 44
p1 s2 2 4
day 2
s1 s2 s3
p1 44 4
p2
s1 s2 s3
p1 12 50
p2 11 8
day 1
Example
P
r
o
d
u
c
t

Time
M T W Th F S S
Juice
Milk
Coke
Cream
Soap
Bread
NY
SF
LA
10
34
56
32
12
56
56 units of bread sold in LA on M
Dimensions:
Time, Product, Store
Attributes:
Product (upc, price, )
Store

Hierarchies:
Product Brand
Day Week Quarter
Store Region Country
roll-up to week
roll-up to brand
roll-up to region
Cube Aggregation: Roll-up
day 2
s1 s2 s3
p1 44 4
p2
s1 s2 s3
p1 12 50
p2 11 8
day 1
s1 s2 s3
p1 56 4 50
p2 11 8
s1 s2 s3
sum 67 12 50
sum
p1 110
p2 19
129
. . .
drill-down
rollup
Example: computing sums
Cube Operators for Roll-up
day 2
s1 s2 s3
p1 44 4
p2
s1 s2 s3
p1 12 50
p2 11 8
day 1
s1 s2 s3
p1 56 4 50
p2 11 8
s1 s2 s3
sum 67 12 50
sum
p1 110
p2 19
129
. . .
sale(s1,*,*)
sale(*,*,*)
sale(s2,p2,*)
s1 s2 s3 *
p1 56 4 50 110
p2 11 8 19
* 67 12 50 129
Extended Cube
day 2 s1 s2 s3 *
p1 44 4 48
p2
* 44 4 48
s1 s2 s3 *
p1 12 50 62
p2 11 8 19
* 23 8 50 81
day 1
*
sale(*,p2,*)
Aggregation Using Hierarchies
region A region B
p1 56 54
p2 11 8
store
region
country
(store s1 in Region A;
stores s2, s3 in Region B)
day 2
s1 s2 s3
p1 44 4
p2
s1 s2 s3
p1 12 50
p2 11 8
day 1
Points to be noticed about MOLAP
Pre-calculating or pre-consolidating transactional data improves
speed.
BUT
Fully pre-consolidating incoming data, MDDs require an enormous
amount of overhead both in processing time and in storage. An
input file of 200MB can easily expand to 5GB

MDDs are great candidates for the <50GB department data marts.

Rolling up and Drilling down through aggregate data.

With MDDs, application design is essentially the definition of
dimensions and calculation rules, while the RDBMS requires that
the database schema be a star or snowflake.
Hybrid OLAP (HOLAP)
HOLAP = Hybrid OLAP:

Best of both worlds

Storing detailed data in RDBMS

Storing aggregated data in MDBMS

User access via MOLAP tools
Multi-
dimensiona
l access Multidimensional
Viewer
Relational
Viewer
Client MDBMS Server
Multi-
dimensio
naldata
SQL-
Read
RDBMS Server
User
data Meta data
Derived
data
SQL-
Reach
Through
SQL-
Read
Data Flow in HOLAP

Das könnte Ihnen auch gefallen