Beruflich Dokumente
Kultur Dokumente
LESSON MAP
Learning outcome
Next lesson
Data Warehouse
Introduction
Why Data
Summary
Architecture Warehouse?
Characteristics
Definition
OLAP
Nonvolatile
Data mining Subject oriented
Time-variant
Integrated
1
Learning outcome
• Describe the purpose of a data warehouse.
• Describe the characteristics of a data warehouse.
• Explain the relationship between a data warehouse
and an operational database.
• Explain the architecture of a data warehouse.
• Explain the related technologies of a data warehouse
… subject-oriented ...
• The data in the warehouse is defined and
organized in business terms, and is grouped
under business-oriented subject headings,
such as
– customers
– products
– sales
rather than application oriented data.
Savings Account System
Investment Account System
Checking Account System
Previous slide: Definition of Data
Warehouse 5
CHARACTERISTICS
… integrated ...
• The data warehouse contents are defined such that
they are valid across the enterprise and its operational
and external data sources
Data warehouse
Operational systems
• The data in the warehouse should be
– clean
– validated
– properly integrated
CHARACTERISTICS
… time-variant ...
• All data in the data warehouse is time-
stamped at time of entry into the
warehouse or when it is summarized
within the warehouse.
• This chronological recording of data
provides historical and trend analysis
possibilities.
• On the contrary, operational data is
overwritten, since past values are not of
interests.
OLAP
USER2
external sources
extraction used
cleaning by
data
validation
warehouse data
summarize.
mining
USER3
operational
query
databases
data mart
Previous slide: Data Integration 10
On-Line Analytical Processing
(OLAP)
• Term introduced by E.F. Codd (1993) in
contrast to On-Line Transaction
Processing (OLTP)
• The OLAP Council’s definition:
“A category of software technology that
enables analysts, managers and executives
to gain insight into data through fast,
consistent, interactive access to a wide
variety of possible views of information that
have been transformed from raw data to
reflect the real dimensionality of the
enterprise as understood by the user”
On-Line Analytical Processing
(OLAP)
• Basic idea: users should be able to
manipulate enterprise data models
across many dimensions to understand
changes that are occurring.
• Data used in OLAP should be in the
form of a multi-dimensional cube.
Market
Product
Dimensional Hierarchies
• Each dimension can be hierarchically
structured
Year Country
http://www.cs.brown.edu/courses/cs227/Papers/Visualizat
ion/Choong.pdf
DBMS for Warehouse
• Multidimensional DBMS Essbase,
UniVerse
• Relational DBMS Oracle, SG
server, DB2, MySQL, PostgreSQL,
Firebird,
DATA MINING
Definition : The process of extracting valid, previously unknown, comprehensible
and actionable information from large databases and using it to
make crucial business decisions.
Association discovery
Are occurrences that are linked to a single event
Example: Supermarket: purchase beer and buy
peanuts 55% of the time
Characteristics
• Huge volumes of continuous data, possibly infinite
• Fast changing and requires fast, real-time response
• Data stream captures nicely our data processing needs of today
• Random access is expensive
• single scan algorithm (can only have one look)
• Store only the summary of the data seen thus far
• Most stream data are at pretty low-level or multi-dimensional in nature,
needs multi-level and multi-dimensional processing
Goal: Mine patterns, process queries and compute statistics on data
streams in real-time
Applications of Stream Data Mining