Sie sind auf Seite 1von 23

SOFT COPY OF THE SEMINAR TOPIC ON

DATA WAREHOUSE

SUBMITTED BY:
IQxplorer
What is Data Warehouse ?
A data warehouse is a repository of information
gathered from multiple sources stored under a unified
schema,at a single site.
The data warehouse is a relational data base
organised to hold information in a structure that best
supports reporting and analysis.
Characteristics of Data Warehouse :

The concept of a Data Warehouse given by Bill


Inmon , the father of Data Warehouse is depicted in the
figure below:

Subject Orientation.
Time variance.
Non-Volatile.
Integrated.
Architecture :
A Data Warehouse Architecture (DWA) is a
way of representing the overall structure of data,
communication, processing and presentation that
exists for end-user computing within the
enterprise.

The architecture of data warehouse is


as follows:
Load Manager :

Data flows into the data warehouse


through the load manager.The data is
extracted from the operational databases &
supplemented by data imported from external
sources.
Query manager :

It provides an interface between the


warehouse& its users.It performs task like
directing the queries to appropriate tables,
monitoring the effectiveness of the
indexes & summary data & query
scheduling.
The load manager primarily performs an extract
Transform load(ETL) operation :

Data extraction.
Data transformation.
Data loading.
Components of data warehouse :

The primary components of data warehouses


are :
Data Sources
Data Transformation
Reporting
Metadata
Operations
Optional Components
Data Sources:
Data sources refers to any electronic repository
of information where data is passed from these systems
to the data warehouse either on a transaction-by
transaction basis for real-time data warehouses or on a
regular cycle.

Data Transformation:
The Data Transformation layer receives data
from the data sources, cleans and standardizes it, and
loads it into the data repository.

Data Warehouse:
The data warehouse is a relational database
organized to hold information in a structure that best
supports reporting and analysis.
Reporting:
The data in the data warehouse must be available
to all the users if the data warehouse is to be useful.

Metadata:
Metadata or "data about data", is used to inform
users of the data warehouse about its status and the
information held within the data warehouse.

Operations:
Data warehouse operations comprises of the
processes of loading, manipulating and extracting data
from the data warehouse. Operations also covers user
management, security, capacity management and related
functions.
Optional Components:

In addition, the following components also exist


in some data warehouses:
1. Dependent Data Marts: A dependent data mart
is a physical database (either on the same
hardware as the data warehouse or on a separate
hardware platform) that receives all its information
from the data warehouse
2. Logical Data Marts: A logical data mart is a
filtered view of the main data warehouse but does
not physically exist as a separate data copy.
3. Operational Data Store: An ODS is an
integrated database of operational data. Its
sources include legacy systems and it contains
current or near term data
Design of data warehouse :

The key consideration involved in the


design of a data ware house are:

Time span.
Granularity.
Dimensionality.
Aggregations.
Partitioning.
Methods of storing data in a data
warehouse :
The general principle used in the majority of
data warehouse is that data is stored at its most
elemental level for use in reporting and information
analysis.
There are two primary approaches to organising
the data in a data warehouse:

Dimensional approach : Here, information is


stored as "facts" which are numeric or text data that
capture specific data about a single transaction or
event, and "dimensions" which contain reference
information that allows each transaction or event to be
Database normalization: In this style, the
data in the data warehouse is stored in third normal
form.
The main advantage of this
approach is that it is quite straightforward to add
new information into the database, while the primary
disadvantage of this approach is that it can be quite
slow to produce information and reports.
Advantages of using data
warehouse:
Enhances end-user access to a wide
variety of data.
Increases data consistency.
Increases productivity and decreases
computing
costs.
Is able to combine data from different
sources, in one place.
It provides an infrastructure that could
support changes to data and replication of the
changed data back into the operational systems.
Concerns in using data
warehouse:
Extracting, cleaning and loading data
could be time consuming.
Problems with compatibility with
systems already in place e.g. transaction
processing system.
Providing training to end-users, who
end up not using the data warehouse.
Security could develop into a serious
issue, especially if the data warehouse is
web accessible.
Future Developments:

Data Warehousing is such a new field


that it is difficult to estimate what new
developments are likely to most affect it.
Clearly, the development of parallel DB
servers with improved query engines is
likely to be one of the most important.
Parallel servers will make it possible to
access huge data bases in much less time.
Conclusion:
Data Warehousing is not a new
phenomenon. All large organizations already have
data warehouses, but they are just not managing
them. Over the next few years, the growth of
data warehousing is going to be enormous with
new products and technologies coming out
frequently. In order to get the most out of this
period, it is going to be important that data
warehouse planners and developers have a clear
idea of what they are looking for and then
choose strategies and methods that will provide
them with performance today and flexibility for
tomorrow.