Sie sind auf Seite 1von 28

What is Data Warehouse?

Definition:- A Data Warehouse (DW) is defined as a subject-oriented,


integrated, time-variant, non-volatile collection of data in support of
managements decision-making process.
OR
A single, complete and consistent store of data obtained from a
variety of different sources made available to end users in a way, they
can understand and use in a business context.
A decision support database maintained separately from the
organizations operational database

Why the need for data warehouse arose?
The processing load of reporting reduced the response time of the
operational systems.
The database designs of operational systems were not optimized for
information analysis and reporting.
Most organizations had more than one operational system, so
company-wide reporting could not be supported from a single
system.
Development of reports in operational systems often required writing
specific computer programs which was slow and expensive.

We Need Data Warehouses FOR:
Consolidation of information resources
Improved query performance
Separate research and decision support functions from the
operational systems
An OLTP (on-line transaction processor) or operational system is used
to deal with the everyday running of one aspect of an enterprise.
OLTP systems are usually designed independently of each other and it
is difficult for them to share information.
Foundation for data mining, data visualization, advanced reporting
and OLAP tools


Characteristics of Data Warehouse
Subject oriented. Data are organized based on how the users refer to
them. And is organized in such as way that relevant data is clustered
together for easy access.
Integrated. All inconsistencies regarding naming convention and
value representations are removed.
Establishment of a common unit of measure for all synonymous data
elements from dissimilar database.
The database contains data from most or all of an organization's
operational applications, and that this data is made consistent.

Nonvolatile. Data are stored in read-only format and do not change
over time.
Typical activities such as deletes, inserts, and changes that are
performed in an operational application environment are completely
nonexistent in a DW environment.
Only two data operations are ever performed in the DW: data loading
and data access.
Time variant. Data are not current but normally time series.
The changes to the data in the database are tracked and recorded so
that reports can be produced showing changes over time.
Data warehouse environment
Data Warehouse The queryable source of data in the enterprise. It
is comprised of the union of all of its constituent data marts.
Data Mart A logical subset of the complete data warehouse.
Often viewed as a restriction of the data warehouse to a single
business process or to a group of related business processes targeted
toward a particular business group.
Operational Data Store (ODS) A point of integration for operational
systems that developed independent of each other. Since an ODS
supports day to day operations, it needs to be continually updated.

Generic data warehouse environment
The environment for data warehouses and marts includes the
following:
Source systems that provide data to the warehouse or mart;
Data integration technology and processes that are needed to
prepare the data for use;
Different architectures for storing data in an organization's data
warehouse or data marts;
Different tools and applications for the variety of users;
Metadata, data quality, and governance processes must be in place to
ensure that the warehouse or mart meets its purposes.

Data Warehouse Architectures

Data warehouses and their architectures vary depending upon the
specifics of an organization's situation. Three common architectures
are:
Data Warehouse Architecture (Basic)
Data Warehouse Architecture (with a Staging Area)
Data Warehouse Architecture (with a Staging Area and Data Marts)
Data Warehouse Architecture (Basic)

End users directly access data derived from several source systems
through the data warehouse.
In the figure:
The metadata and raw data of a traditional OLTP system is present, as
is an additional type of data, summary data.
Summaries are very valuable in data warehouses because they pre-
compute long operations in advance. For example, a typical data
warehouse query is to retrieve something like August sales.

Data Warehouse Architecture (with a
Staging Area)

There is need to clean and process your operational data before
putting it into the warehouse.
This can be done programmatically, although most data warehouses
use a staging area instead.
A staging area simplifies building summaries and general warehouse
management.

Data Warehouse Architecture (with a
Staging Area and Data Marts)

Although the architecture is quite common , but warehouse's
architecture can be customized for different groups within the
organization. This can be done by adding data marts, which are
systems designed for a particular line of business.
The figure- illustrates an example where purchasing, sales, and
inventories are separated. In this example, a financial analyst might
want to analyse historical data for purchases and sales.
ARCHITECTURE
ARCHITECTURE
ETL Overview
Extraction Transformation Loading ETL
To get data out of the source and load it into the data warehouse
simply a process of copying data from one database to other
Data is extracted from an OLTP database, transformed to match the
data warehouse schema and loaded into the data warehouse
database
Many data warehouses also incorporate data from non-OLTP systems
such as text files, legacy systems, and spreadsheets; such data also
requires extraction, transformation, and loading
When defining ETL for a data warehouse, it is important to think of
ETL as a process, not a physical implementation

ETL is often a complex combination of process and technology that consumes a
significant portion of the data warehouse development efforts and requires the
skills of business analysts, database designers, and application developers
It is not a one time event as new data is added to the Data Warehouse
periodically monthly, daily, hourly
Because ETL is an integral, ongoing, and recurring part of a data warehouse
Automated
Well documented
Easily changeable


The typical extract-transform-load (ETL)-based data warehouse
uses staging, data integration, and access layers to house its key functions.
The staging layer or staging database stores raw data extracted from each
of the disparate source data systems.
The integration layer integrates the disparate data sets by transforming the
data from the staging layer often storing this transformed data in
an operational data store (ODS) database.
The integrated data are then moved to yet another database, often called
the data warehouse database, where the data is arranged into hierarchical
groups often called dimensions and into facts and aggregate facts.
The combination of facts and dimensions is sometimes called a star
schema. The access layer helps users retrieve data.
[4]


This definition of the data warehouse focuses on data storage.
The main source of the data is cleaned, transformed, cataloged and
made available for use by managers and other business professionals
for data mining, online analytical processing, market
research and decision support.

However, the means to retrieve and analyze data, to extract,
transform and load data, and to manage the data dictionary are also
considered essential components of a data warehousing system.
Thus, an expanded definition for data warehousing includes business
intelligence tools, tools to extract, transform and load data into the
repository, and tools to manage and retrieve metadata.


DATA WAREHOUSE SYSTEM IN DU

The main problem addressed by a data warehouse is that, end-users have a
difficult time producing ad-hoc or other specialized queries and reports.
This is due to several factors:
Most of the data is stored in ADABAS, which is difficult for end-users to access.
The data stores were designed for transaction processing not ad-hoc reporting.
Obtaining the data or a report usually requires waiting for a programmer to either
develop the report or provide a customized download program.
All of the data may not be consistent as of the same point in time.
There may not be enough copies of the data kept for historical reporting in the
operational systems.
End-users do not have the knowledge of what is kept in the existing data stores
Advantages

The data warehouse addresses these factors and provides many
advantages to the end-users of the University including:
Improved end-user access to a wide variety of University data
Increased data consistency
Additional documentation of the data
Potentially lower computing costs and increased productivity
Providing a place to combine related data from separate sources
Creation of a computing infrastructure that can support changes in
computer systems and business structures
Empowering end-users to perform any level of ad-hoc queries or reports
without impacting the performance of the operational systems

Student Data

The Student Data Warehouse was the first data warehouse to be developed at
WSU. It consists of demographic information about students as well as the
courses in which they are enrolled. The warehouse also contains enrollment
statistics for each course offering (section). In addition, the data necessary to
support this information is also stored in the warehouse.
Below is a list of the major classes of data currently in the student data
warehouse:
Academic Course
Academic Degree Conferment
Academic Section
Address
Course Section Snapshot
Email Address
Snapshot Generation
Student
Student Center Snapshot
Student Certificate
Student Course Snapshot
Student Course Transcript
Student NCATE Endorsement
Student Snapshot
Student Transcript
Supporting Data
Data Use

Access to data for departmental and college use only.

Must be for official use only
Should not be shared with third parties (even directory information)
Personally identifiable information cannot be shared outside the university
Reports cannot be shared outside the university until the figures have been checked
by Institutional Research to make certain they are consistent with official university
figures
Published reports must not include personally identifiable information.
Access to the Data Warehouse must be protected from unauthorized use.


Departments and colleges should generally access only their students' data.

Mailings should be restricted to those students within the college or department
Reports should be restricted to student data within the college or department
Institutional Research should be consulted for university-wide studies
Enrollment reporting is the domain of Institutional Research and the Registrar's
Office.

Departments and colleges should not report enrollment using the Data Warehouse
Questions regarding perceived enrollment discrepancies, should be directed to Institutional
Research or the Registrar's Office
Enrollment and FTE reports should be for internal use and should be considered estimates.