Beruflich Dokumente
Kultur Dokumente
Outline
Business Intelligence
Evolution
60s: Batch reports
hard to find and analyze information
inflexible and expensive, reprogram every new request
Data
View
Usage
Unit of work
Access
Operations
# Records accessed
#Users
Db size
Metric
OLAP
Clerk, IT Professional
Day to day operations
Application-oriented (E-R
based)
Current, Isolated
Detailed, Flat relational
Structured, Repetitive
Short, Simple transaction
Read/write
Index/hash on prim. Key
Tens
Thousands
100 MB-GB
Trans. throughput
Knowledge worker
Decision support
Subject-oriented (Star,
snowflake)
Historical, Consolidated
Summarized, Multidimensional
Ad hoc
Complex query
Read Mostly
Lots of Scans
Millions
Hundreds
100 GB-TB
Query throughput, response
Data Warehouse
A decision support database that is maintained
separately from the organizations operational
databases.
A data warehouse is a
subject-oriented,
integrated,
time-varying,
non-volatile
A collection of data that is used primarily in
organizational decision making
Three-Tier Architecture
Warehouse database server
Almost always a relational DBMS; rarely flat files
OLAP servers
Relational OLAP (ROLAP): extended relational DBMS that
maps operations on multidimensional data to standard relational
operations.
Multidimensional OLAP (MOLAP): special purpose server that
directly implements multidimensional data and operations.
Clients
Query and reporting tools
Analysis tools
Data mining tools (e.g., trend analysis, prediction)
Product
Multidimensional Data
Juice
Cola
Milk
Cream
10
47
30
12
Date
Sales volume
as a function
of date, city
and product
1st
2nd
3rd
Germany
Germany
Switzerland
Switzerland
U.S.A. U.S.A.
Country
Diploma
Country
Term
German students
in the 4th term
pursuing a diploma
4th
Operations in Multidimensional
Data Model
Aggregation (roll-up)
dimension reduction: e.g., total sales by city
summarization over aggregate hierarchy: e.g., total sales by city
and year -> total sales by region and by year
10
Cola
47
Milk
30
Cream
12
Product
Domain-specific enrichment
Schema design
Specialized scan, indexing and join techniques
Handling of aggregate views (querying and
materialization)
Supporting query language extensions beyond
SQL
Complex query processing and optimization
Data partitioning and parallelism
Product
Order No
ProductNO
Order Date
ProdName
Customer
Customer No
Customer Name
Customer
Address
City
Salesperson
SalespersonID
SalespersonName
City
Quota
Fact Table
ProdDescr
OrderNO
Category
SalespersonID
CategoryDescription
CustomerNO
UnitPrice
ProdNo
Date
DateKey
DateKey
CityName
Date
Quantity
Total Price
City
CityName
State
Country
Star Schema
A single fact table and a single table for each dimension
Every fact points to one tuple in each of the dimensions
and has additional attributes
Does not capture hierarchies directly
Generated keys are used for performance and maintenance
reasons
Fact constellation: Multiple Fact tables that share many
dimension tables
Example: Projected expense and the actual expense may share
dimensional tables
Order No
Product
ProductNO
Order Date
ProdName
CategoryName
ProdDescr
CategoryDescr
Fact Table
Customer
Customer No
Customer Name
Customer
Address
City
Salesperson
OrderNO
SalespersonID
CustomerNO
Category
Category
UnitPrice
ProdNo
Date
DateKey
DateKey
CityName
Date
SalespersonID
Quantity
Month
City
SalespersonName
Total Price
CityName
City
Quota
Category
State
Country
Month
Month
Year
State
StateName
Country
Year
Year
Snowflake Schema
Load
Sort, summarize, consolidate, compute views, check
integrity, build indexes, partition
Refresh
Propagate updates from sources to the warehouse
Metadata Repository
Administrative metadata
Metadata Repository .. 2
Business data
business terms and definitions
ownership of data
charging policies
operational metadata
data lineage: history of migrated data and sequence of
transformations applied
currency of data: active, archived, purged
monitoring information: warehouse usage statistics, error
reports, audit trails.
OLAP Tools
Main Problems:
Cost per license, poor semantics of aggregations across
tables, performance for multiple dimension cubes
http://www.tableausoftware.com/ptour.htm