Beruflich Dokumente
Kultur Dokumente
Introduction
Operational vs. Warehouse
Multidimensional Data
Examples
MOLAP vs ROLAP
Dimensional Hierarchies
OLAP Queries
Demos
Comparison with SQL
Queries
CUBE Operator
Multidimensional Design
Star/Snowflake Schemas
03/07/15
Online Aggregation
Implementation Issues
Bitmap Index
Constructing a Data
Warehouse
Views
Materialized View Example
Materialized View is an
Index
Issues in Materialized Views
Maintaining Materialized
Views
Introduction
In the late 80s and early 90s, companies
began to use their DBMSs for complex,
interactive, exploratory analysis of historical
data.
This was called Decision Support, and OnLine Analytic Processing (OLAP).
DS slowed down the operation of the
company, called On-Line Transaction
Processing (OLTP).
This led to the creation of Data Warehouses,
separate from operational Databases.
03/07/15
DSS / Warehouse /
DataMart
opposite
opposite
Legacy Applications,
Heterogeneous databases
Opposite
Often Distributed
Current data
03/07/15
Operational vs Data
Warehouse
Requirements, ctd
Operational
Warehouse
03/07/15
Operational Data
Data Warehousing
Integrated data spanning
EXTRACT
TRANSFORM
long time periods, often
LOAD
augmented with summary
REFRESH
information.
Several terabytes to
DATA
petabytes common.
Metadata
WAREHOUSE
Interactive response
Repository
times expected for
SUPPORTS
complex queries; ad-hoc
updates uncommon.
03/07/15
DATA
MINING
OLAP
5
Multidimensional Data
In order to support OLAP, warehouse data is
often structured multidimensionally, as
measures and dimensions.
Measure: Numeric attribute, e.g. sales amount
Dimension: attribute categorizing the
measure, e.g. product, store, date of sale.
The fact table is a foreign key for each
dimension, plus an attribute for each measure.
There will also be a dimension table for each
dimension.
On the next page, the fact tables are red, the
dimension tables are green.
03/07/15
Examples of
MultiDimensional Data
03/07/15
MOLAP vs ROLAP
Multidimensional data can be stored physically
in a (disk-resident, persistent) array; called
MOLAP systems. Alternatively, can store as a
relation; called ROLAP systems.
The main relation, which relates dimensions to
a measure, is called the fact table. Each
dimension can have additional attributes and
an associated dimension table.
03/07/15
locid
amt
timeid
pid
25.2
11
Multidimensional
Collection of numeric measures,
11
Data
Model
which depend on a set of dimensions.
1 1 25
2 1 8
11 3 1 15
12 2 1 20
13
11
pid
12
Slice locid=1
is shown:
03/07/15
1 1 30
10
10
30
20
50
25
2
timeid
15
locid
3
12 3 1 50
13 1 1 8
13 2 1 10
13 3 1 10
11 1 2 35
9
Dimension Hierarchies
PRODUCT
TIME
LOCATION
year
category
state
pname
PID
03/07/15
quarter
week
date
10
03/07/15
11
OLAP Queries
Total 176
CA
81
Tota
144
107 145
35
110
223 339
12
Cognos Demo
03/07/15
13
Tableau Demo
http://www.tableausoftware.com/products/tour2
03/07/15
14
15
SELECT SUM(S.amt)
FROM Sales S
GROUP BY grouping-list
16
Example Multidimensional
Design
TIMES
timeid dat
e
PRODUCTS
week mont
h
pid timei
d
locid amt
SALES (Fact
table)
LOCATIONS
locid city
state countr
y
03/07/15
17
Star/Snowflake Schemas
Why normalize?
Space
Redundancy, anomalies
Why unnormalize?
Performance
03/07/15
18
Online Aggregation
03/07/15
19
03/07/15
20
Bitmap Indexes
Work when an attribute has few values,
e.g. gender or rating
Advantage: Small enough to fit in
memory
Many queries can be answered by bitvector ops, e.g. females with rating = 3.
03/07/15
21
25.7 Constructing a D.
Warehouse
Extract
Is the data in native format?
Clean
How many ways can you spell Mr.?
Errors, missing information
Transform
Fix semantic mismatches.
E.g. Last+first vs. Name
Load
Do it in parallel or else.
Refresh
Both data and indexes
03/07/15
22
03/07/15
24
03/07/15
25
A Materialized View is an
Index
03/07/15
26
03/07/15
27
03/07/15
28
Refreshing Materialized
Views
How often should we refresh the
materialized view?
Many enterprises refresh warehouse data
only weekly/nightly, so can afford to
completely rebuild their materialized
views.
Others want their warehouses to be
current, so materialized views must be
updated incrementally if possible.
Let's look at some simple examples.
03/07/15
29
03/07/15
30
Consider V = R S
How is V modified if r is inserted to R?
How is V modified if r is deleted from R?
03/07/15
31