Sie sind auf Seite 1von 25

Data warehouse –

Concept and Fundamentals

Sridevi
BOAWMII2
Agenda
 Why Needed
 The Value Pyramid
 What is Data warehouse?
 Features, Goals and Applications
 Stages
 A Typical Data Warehouse
 ETL Process & Tools
 OLTP vs. Data Warehouse
 Modeling Constructs
 Entity Relationship Diagram

 Star schema

 Snow flake schema

 Benefits of Data warehousing


 Products and Vendors
 Conclusion
 Terminology and References
Why Needed
Earlier
 Small Organizations.
 Less number of records.

Current Business Environment


 Large Organizations.
 Number of records in millions.
 Heterogeneous source of data.
 Need to make quick and effective strategic decisions
 Ever Changing
 Chaotic
 Global
 Competitive advantages stems from well informed decisions
 Based on an understanding of:
 Products

 Customers Preferences
 Competition
 Company strengths
The Value Pyramid

Each layer provides


Value en route to a
targeted business Business Outcomes
Outcome
Actions

Decisions

Knowledge
Increased revenue
Increased productivity
Information
Reduced costs
Competitive advantage Data
What is Data Warehouse?

•A Data Warehouse is a repository of integrated


information, available for queries and analysis for
business users.

•Data and information are extracted from heterogeneous


sources (RDBMS, Flat files etc.) as they are generated.
This makes it much easier and more efficient to run
queries over data that originally came from different
sources.
Some more definitions
 A collection of integrated, subject oriented databases
designed to support the DSS function where each unit of
data is relevant at some moment of time (Inmon 1991)
 A copy of transaction data specifically structured to
Query and Analysis (Kimball 1996)
 Data Warehouse is NOT a specific technology
 It is a series of processes, procedures and tools that
help the enterprise understand more about itself, its
products, its customers and the market it services.
 It is NOT possible to purchase a Data Warehouse

 But, it is possible to build one.


Features
Non Volatile - Used mainly for reporting purpose and it is independent of
transactional data.

Subject Orientation- All relevant data is stored together. Ex: Sales, Finance,
Marketing, Customer data etc.

Historical data- Can contain data of several years depending on company


requirements.

Data warehouse is a concept. Not a product

Set of Hardware and Software components used to analyze massive amounts of


data.

Intelligent way of managing Data.

Data->Information->Knowledge->Decision
Goals and Applications

 Goals of a Data Warehouse


 Provide reliable, High performance access
 Consistent view of Data: Same query, same data. All users should be
warned if data load has not come in.
 Slice and dice capability
 Quality of data is a driver for business re-engineering.

 Data Warehousing Applications:


 Customer Profitability Analysis
 Customer satisfaction and retention
 Buyer behavior.
 Pricing, Promotion Analysis
 Market research
 Inventory optimization
Stages of Data warehousing

Heterogeneous Extraction, Business


Source Systems Transformation & Intelligence
Loading (ETL)
Query &
Reporting
Operational

Staging
Legacy Area Data OLAP
Warehouse

External
Data Mining
A Typical Data Warehouse

Data Warehouse
Summarized Data Facilitates in firing
Meta Data queries on detailed
data.

Detailed Data

Data Data Data Data marts contain


Mart Mart Mart data specific to a
subject.
ETL Process
 The process of extracting data from source systems and
bringing it into the data warehouse is commonly called
ETL, which stands for extraction, transformation, and loading.

 The acronym ETL is perhaps too simplistic, because it omits the


transportation phase and implies that each of the other phases
of the process is distinct.
Refer to it as entire process,including data loading, as ETL. You
should understand that ETL refers to a broad process, and not
three well-defined steps.
What happens during the ETL process?
During extraction, the desired data is identified and
extracted from many different sources, including database
systems and applications
Very often more data, is extracted so the identification of the
relevant data will be done at a later point in time.
Depending on the source system's capabilities (for
example, operating system resources), some transformations
may take place during this extraction process.
After extracting data, it has to be physically transported to
the target system or an intermediate system for further
processing.
Depending on the chosen way of transportation, some
transformations can be done during this process, too.
ETL Tools
Major ETL Tools are:

•Informatica Power Mart


•Informatica Power Center
•DP Warehouse
•Oracle Express
•Data Mirror
OLTP vs. Data Warehouse
OLTP system runs the business, Data Warehouses tell you how to run the business

Characteristic OLTP Data Warehouse


Orientation Transaction Analysis
Data Access Record at a time Set at a time
Updates Frequent & Periodic & Scheduled
Unscheduled
Response time Seconds required Minutes acceptable
Concurrent users Many Few
Availability Guaranteed As needed
Data structures Highly normalized Often de-normalized
Data nature Current historical
If most of your business needs are
 To report on data in a single transaction processing system

 All the historical data you need are in the system

 Data in the system is clean

 Your hardware can support reporting against the live system data

 The structure of the system data is relatively simple

 Your firm does not have much interest in end user ad hoc
query/report tools

Data warehousing may not be for your business!!


Modeling Constructs

 Entity Relationship Diagram


 Star schema
 Snow flake schema

Within the implementation of a warehouse, several of these


constructs may be integrated to form an optimal design
Entity Relationship Diagram
 Based on set theory and SQL
 Highly normalized
 Optimized for update and fast transaction turnaround
 Not suited for querying in a data warehouse environment
 Diagrams like these are very difficult for users to visualize and
memorize.
Star Schema
 A central fact table surrounded by a number of dimension
tables.
 Dimensions are business entities on which calculations are
done. They can be numeric or alphanumeric.
Example: Product table comprising brand name,
category, packaging type, size.
 Facts are numerical measurements of business with
respect to dimensions. They are numeric and additive
(sum able across any combination)
e.g. A sales fact table could contain time, product and store
key along with dollars sold, units sold, dollars cost.
Snow Flake Schema
 Normalized version of the star schema with the addition of
normalized dimension tables.

 Normalization helps to reduce redundancy in the dimension


tables, but affects performance and user comprehension.
Benefits of Data warehousing

 Helps in efficiently converting huge stacks of data into


information and further into better business decision
making.
 Develop applications quickly on changing needs ensuring
highest returns on your investments.
 Analyzing daily sales information.
 Competitive edge to Company.
Products and Vendors
 Data Warehouses
 Oracle
 Sybase
 DB2
 OLAP tools
 Oracle Express
 Hyperion Essbase
 Data Mining
 Oracle Darwin
 IBM Intelligent Data Miner
 Querying & Reporting
 Oracle Discoverer
 Business Objects
 Brio Enterprise
Conclusion

 Not just storage of data


 Also an architecture and tool for query and analysis.
Terminology
Data Mart - Contains Data about a specific subject. Eg. Official
data, Customer data, Campaign data etc.

Data Cubes - Central object of data containing information in a


multidimensional structure.

Data Cleansing - Regular cleaning of data.

ETL - Extraction, Transformation and Loading of Data.

Data Mining - A mechanism which uses intelligent algorithms to


discover patterns, clusters and models from data.
Terminology
Metadata - Data about data. Describes the data stored in Data
warehouse.
There are two views of Metadata
 Business – are warehouse attributes and properties for use by business
users
 Technical – describe data flow from Operational systems into the data
warehouse
OLAP
 Online Analytical processing
 Tool(s) for Analytical Reporting including Graphical capabilities.
References

 www.datawarehouse.com
 www.techguide.com
 http://192.168.118.118/ebiz/
 www.siebal.com

Das könnte Ihnen auch gefallen