Sie sind auf Seite 1von 6

Why Data Warehousing?

Data explosion in data base management systems (DBMS)


Inefficient retrieval of required information
Needs of Decision Support Systems (DSS) to facilitate decision making
Extracting, cleaning, transforming, and filtering data from DBMS and provide efficient
access to required information
Data warehouse comes to rescue

Who needs data warehouse?

Decision makers who rely on mass amount of data


Those who use customized, complex processes to obtain information from various data
sources
Those who want to use simple technology to access data
Those who require systematic approach for decision

Two major functions of data warehousing

Extracting necessary information for decision making from heterogeneous data sources
and stored in the data warehouse
Providing queries and decision analyses to users.

Typical DW Queries

What was the total revenue for Scotland in the third quarter of 2012?
What was the total revenue for property sales for each type of property in Great Britain
in 2012?
What are the three most popular areas in each city for the renting of property in 2012
and how does this compare with the figures for the previous two years?

What is Data Warehousing?


A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of
data in support of managements decision-making process
1. subject-oriented
The warehouse is organized around the major subjects of the enterprise (e.g. customers,
products, and sales) rather than the major application areas (e.g. customer invoicing, stock
control, and product sales).This is reflected in the need to store decision- support data rather
than application-oriented data.
2. Integrated
The data warehouse integrates corporate application-oriented data from different source
systems, which often includes data that is inconsistent.
The integrated data source must be made consistent to present a unified view of the data to
the users

3. Time-variant data
Data in the warehouse is only accurate and valid at some point in time or over some time
interval.
Time-variance is also shown in the extended time that the data is held, the implicit or explicit
association of time with all data, and the fact that the data represents a series of snapshots.
4. Non-volatile
Data in the warehouse is not updated in real- time but is refreshed from operational systems on
a regular basis.
New data is always added as a supplement to the database, rather than a replacement.

COMPARISON OF DATA WAREHOUSE AND OPERATIONAL DATA


HOW IS THE WAREHOUSE DIFFERENT?
The data warehouse is distinctly different from the operational data used and maintained by
day-to-day operational systems.
Data warehousing is not simply an access wrapper for operational data, where data is simply
dumped into tables for direct access. Among the differences:

OLTP vs. OLAP


we can divide IT systems into transactional (OLTP) and analytical (OLAP). In general we can
assume that OLTP systems provide source data to data warehouses, whereas OLAP systems
help to analyze it.

OLTP (On-line Transaction Processing) is characterized by a large number of short on-line


transactions (INSERT, UPDATE, and DELETE).
The main emphasis for OLTP systems is put on very fast query processing, maintaining data
integrity in multi-access environments and an effectiveness measured by number of
transactions per second.
In OLTP database there is detailed and current data, and schema used to store transactional
databases is the entity model (usually 3NF).
OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions.
Queries are often very complex and involve aggregations.
For OLAP systems a response time is an effectiveness measure.
OLAP applications are widely used by Data Mining techniques.

In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas


(usually star schema).
OLTP System
Online Transaction Processing
(Operational System)

OLAP System
Online Analytical Processing
(Data Warehouse)

Source of data

Operational data; OLTPs are the


original source of the data.

Consolidation data; OLAP data comes


from the various OLTP Databases

Purpose of data

To control and run fundamental


business tasks

To help with planning, problem


solving, and decision support

What the data

Reveals a snapshot of ongoing


business processes

Multi-dimensional views of various


kinds of business activities

Inserts and
Updates

Short and fast inserts and updates Periodic long-running batch jobs
initiated by end users
refresh the data

Queries

Relatively standardized and


simple queries Returning
relatively few records

Often complex queries involving


aggregations

Processing Speed

Typically very fast

Depends on the amount of data


involved; batch data refreshes and
complex queries may take many
hours; query speed can be improved
by creating indexes

Space
Requirements

Can be relatively small if historical Larger due to the existence of


data is archived
aggregation structures and history
data; requires more indexes than
OLTP

Database Design

Highly normalized with many


tables

Typically de-normalized with fewer


tables; use of star and/or snowflake
schemas

Backup and
Recovery

Backup religiously; operational


data is critical to run the business,
data loss is likely to entail
significant monetary loss and
legal liability

Instead of regular backups, some


environments may consider simply
reloading the OLTP data as a recovery
method

Das könnte Ihnen auch gefallen