Sie sind auf Seite 1von 7

Data Warehousing and OLAP Terminology and Definitions

• Topics • DW provides access to data for complex


– Introduction
– Data modelling in data warehouses
analysis, knowledge discovery and decision
– Building data warehouses support
– View Maintenance
– OLAP and data mining
• High performance demands
• OLAP - on line analytical support
• Reading • DSS - decision support systems also known as
– Lecture Notes
– Elmasri and Navathe, Chapter 26 EIS (executive information systems)
– Ozsu and Valduriez, Chapter 16 • Traditional Databases support on-line
– U. Dayal and S. Chaudhuri. An Overview of Data
Warehousing and OLAP Technology. SIGMOD Record
transaction processing (OLTP) - updates,
26, 1, (1997) pp 65 - 74. deletions and insertions
CS544 Module 2 1 CS544 Module 2 2
© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Data Warehouse characteristics Data Warehouse Types


• Multidimensional conceptual view • Enterprise-wide data warehouses - large
• Unlimited dimensions and aggregation levels projects with massive investment of time and
• Unrestricted cross-dimensional operations resources
• Client-server architecture
• Virtual data warehouses - provide views of
• Multi user support
operational Databases that are materialized for
• Accessibility
efficiency
• Transparency
• Intuitive data manipulation • Data Marts - target a subset of the
• Flexible reporting organization, such as departments
• Scalability

CS544 Module 2 3 CS544 Module 2 4


© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Typical DW Implementation Data Warehousing Environments


– In a DW, the data that is subject to analysis is decoupled
from the data produced at the source.
Business Modelling – Information in DW can be organised in a form that makes it
Backbone Networking easy to use for applications. Views: (from simple
External
Data Sources replication to arbitrarily complex processing).
Data – Information is available independently at the availability of
Extraction/ Warehouse Ac Da
An ces ta the source. The views are materialised.
Transformation/ al s &
ys
Scrubbing Metadata is
– Information is structured and stored as to optimise
processing of queries against the DW.
Operational
Data • Warehouse Design
• Application Integration
– A small amount of cooperation is required from the source
• Warehouse Management
• Data Extraction • Query Design & Management
• Performance Tuning to keep the warehouse in sync when the sources change.

CS544 Module 2 5 CS544 Module 2 6


© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Lecture Notes INFS7907


©School of Information Technology and Electrical Engineering
Data Modeling for DWs Two-dimensional matrix
• Multidimensional models – hyper-cubes if region
more then 3 dimensions,

– query performance much better then in the product


relational model.

– there are typically three dimensions in corporate


DW : fiscal periods, products and regions,

CS544 Module 2 7 CS544 Module 2 8


© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

A data cube Multidimensional displays


• Roll-up Display - moves up the hierarchy,
fiscal period grouping into larger units along a dimension

region for example from days to weeks,


product
from weeks to months ,
from months to quarters,
from quarters to years, etc

CS544 Module 2 9 CS544 Module 2 10


© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

The roll-up operation Multidimensional displays


• Drill-down Display - provides a finer grain
region view by dis -aggregating some dimensions

Products 1xx
for example from months to weeks,
Products 2xx
from weeks to days, etc

Products nxx

CS544 Module 2 11 CS544 Module 2 12


© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Lecture Notes INFS7907


©School of Information Technology and Electrical Engineering
The drill-down operation Tables in multi-dimensional model
• Two types of tables in multi-dimensional
Sub-regions model:
A
– Dimension table - tuples of attributes of
P123
styles B dimension,
C
A – Fact table - contains measured variable(s) and
P124 B
styles identifies it (them) with pointers to dimension
C
tables.

CS544 Module 2 13 CS544 Module 2 14


© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Multi-dimensional schemas A Star Schema


Two common multi-dimensional schemas: Fiscal quarters
Fact table - business results Qtr
– Star schema - consists of a fact table with a single
Product
table for each dimension, Quarter
product Region
– Snowflake schema - a variation on the star schema
Prod No
- dimensional tables are organized into a hierarchy
by normalizing them, Sales revenue
for regions
Region

CS544 Module 2 15 CS544 Module 2 16


© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

A Snowflake Schema A fact constellation


Dimension tables Dimension tables

Fiscal qtrs FQ dates

Fact table
Fact table 1
Prod. name Product Fact table 2
Business results Dimension Table Product Business forecast

P. line Sales revenue

CS544 Module 2 17 CS544 Module 2 18


© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Lecture Notes INFS7907


©School of Information Technology and Electrical Engineering
Building a Data Warehouse Typical Data Warehouse
• The data must be extracted from multiple, Reports, analysis, strategic planning
heterogeneous sources
• The data must be formatted for consistency
• The data must be cleaned to ensure the validity
DW
• The data must be fitted to the DW data model
• The data must be loaded into the DW
DB1 DB2 DB3 DBi

Operational databases/ outside sources,


heterogeneous pre-existing databases
CS544 Module 2 19 CS544 Module 2 20
© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Important Questions Data Storage Processes


• How up-to-date must the data be? • Storing the data according to the DW data
• Can the DW go off-line, and for how long? model
• What are the data interdependencies? • Creating and maintaining required data
• What is the storage availability? structures
• What are the distribution requirements - • Creating and maintaining appropriate access
replication and partitioning? path
• What is the loading time - including cleaning, • Supporting the updating of VIEWS
formatting, coping, transmitting, index re- • Refreshing the data
building?
CS544 Module 2 21 CS544 Module 2 22
© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Design Considerations Difficulties with implementation


• Usage projection • Schema conflicts resolutions
• The fit of the data model • The administration of a DW is an intensive
• Characteristics of available data sources enterprise
• Design of the meta data components • The quality control of the data
• Modular component design • Correct estimations of usage - optimization
• Design for a change issues
• Considerations of distributed and parallel • Selection of highly specialized team
architectures

CS544 Module 2 23 CS544 Module 2 24


© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Lecture Notes INFS7907


©School of Information Technology and Electrical Engineering
Query Mediation A Query Mediation Architecture
• Traditional, virtual tables
User query
• A user query is decomposed into sub-queries
that are executed by the data sources
Virtual views
• Answers are based always on current data
• Query performance (at several sites) for large
sets of data is a problem Mediated sub-query
Data source

CS544 Module 2 25 CS544 Module 2 26


© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Monitors A Monitor Architecture


• Views are materialised
User query
• Need for a view maintenance
• The question is: WHERE is the view maintenance
Materialised views
performed?
• In the monitor architecture, the responsibility rests
with the sources. Monitors are installed. Data source
View update
– A few prototypes/products adopted monitors
– IBM Starburst
– ConceptBase system Local update
• Disadvantage is that additional work load is imposed
on the data sources Sources are responsible for view refresh.
CS544 Module 2 27 CS544 Module 2 28
© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Data Warehouse Architecture View Maintenance in DW


• Full re-computation
User query
• DW taken down periodically for scheduled
Materialised views maintenance
• During this period all the views are re-derived from
scratch from the data sources
Base update Source data • It is frequently accepted as the simplest and safest
policy
• However, for yesterday’s data, it is often wasteful,
Local updates
• Time limitation for re-computation is often a big
The warehouse is responsible for view refresh problem
CS544 Module 2 29 CS544 Module 2 30
© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Lecture Notes INFS7907


©School of Information Technology and Electrical Engineering
Approaches to Incremental
Incremental Maintenance View Maintenance
Only parts of DW that change are computed. • Unrestricted base access
1. DW scheduled for maintenance as before but • Self-maintainable DW
views are incrementally done. • Run-time DW self-maintainable
• All changes made to the data sources during operation
hours to be logged.
2. Maintenance can be dynamic, views
always reflect fresh data.
• Problem is efficiency of dynamic maintenance and
data quality.

CS544 Module 2 31 CS544 Module 2 32


© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Web Access to DW Web-Based DW Access


Web • How do you hone the environment to provide good
Browser performance to large numbers of users?
Web Web
Browser Browser • How can you serve the needs of both web and non-
web clients?
• How can you provide Internet access to internal data
The warehousing applications without opening a security
Internet hole?
•Web Server Data Warehouse
Platform • How can you help users design queries for efficient
•Applications •Management
processing?
Middleware • How do you limit unreasonable queries that would
•Metadata jam the system?

CS544 Module 2 33 CS544 Module 2 34


© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

OLAP and Data Mining Goals of Data Mining


Provide a strong decision support environment • Prediction - to show a future potential behavior
• Data Mining: • Identification - data patterns indicate the
– Characterization of patterns inherent in the data,
– Development of a hypothesis from these patterns,
existence of an item, an event, or an activity
– Prediction of future behaviour. • Classification - partition the data based on
• OLAP identified combinations of parameters
– The first approach consists of the use of relational technology, suitably
adapted and extended. The data is stored using tables, but the analysis • Optimization - optimize the use of limited
operations are carried out efficiently using special data structures. This
type of system is normally called ROLAP (Relational OLAP).
resources such as time, money, space , material
– The second, more radical, approach consists of storing data dire ctly in a
multi- dimensional form, using vector data structures. This type of
system is called MOLAP (Multi-dimensional OLAP).

CS544 Module 2 35 CS544 Module 2 36


© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Lecture Notes INFS7907


©School of Information Technology and Electrical Engineering
Types of knowledge discovered
during Data Mining
• Association rules - correlate the presence of
the set of items with another set of values for
Next Lecture
another set of variables
• Classification hierarchies - to create a
Introduction to Business
hierarchy of classes
Process Technology
• Sequential patterns - a sequence of action or
events is sought
• Patterns within time series - similarities
detected within positions of the time series
CS544 Module 2 37 CS544 Module 2 38
© Shazia Sadiq ITEE/UQ © Shazia Sadiq ITEE/UQ

Lecture Notes INFS7907


©School of Information Technology and Electrical Engineering

Das könnte Ihnen auch gefallen