Beruflich Dokumente
Kultur Dokumente
2004 Infosys Technologies Ltd. Private and confidential. All rights reserved.
Agenda
Datawarehouse Introduction Datawarehouse - Architecture Common DW Terminology OLTP v/s OLAP ETL OLAP & Reporting Tools
Branch ops
Infoscions at work
Introduction
Datawarehousing
It is all about information It is a journey and not a destination
Uses some serious Hardware and Software components to extract, transform, cleanse, store and analyze massive amounts of data. Intelligent way of managing Data
DATA HISTORY INFORMATION KNOWLEDGE WISDOM
DATA
INFORMATION HISTORY
A warehouse is place where goods are physically stocked, to facilitate smooth flow of business without any production downtime or crisis. In Simple words: A data warehouse is read only database which copies/stores the data from the transactional database.
Datawarehouse is
Corporate Informational Repository Created from wealth of Operational data. A source for reporting by the various business functions
Sales & marketing, Finance Operations
Provides the tools for User to access consolidated corporate information Collection of data to assist Decision support Data and tools.
Datawarehouse Definitions
Datawarehousing is the process of extracting, integrating, filtering, standardizing, transforming, cleaning and quality checking of the organization applications data and storing it in a consolidated database. Bill Inmon defines DW as, Subject oriented All relevant information specific to a subject e.g Sales Integrated Integration is closely related to subject orientation. Data Warehouses must put together data from disparate form into consistent format. They must resolve the naming conflicts and inconsistencies among units of measure,in order to be integrated. Non Volatile Read only, Non Volatile means that once data has been entered the data warehouse, it cannot be changed. Time Variant Time is a key dimension System which is used by management authorities for making important business decisions
Need for DW
Difficulty in obtaining integrated information Information structure not able to provide full and dynamic analysis of information available Inconsistent results obtained from queries and reports arising from heterogeneous data sources Increased difficulty in delivering consistent comprehensive information in a timely fashion
DW - Goals
Understand business trends and make better forecasting decisions. Analyze daily,weekly, monthly & yearly sales information Bring better products to market in a more timely manner Provide the ammunition to the company to differentiate themselves from their peers The well known successes of DW in the early days
Wal Mart & Fed Ex
Features of DW
Datawarehouse provides
Ability to have consistent data. Ability to access Enterprise data from a single source. Ability to perform analysis quick and easy for the various business user communities. Information about the data in DW Meta Data End Users terminology to define and refer the data Ease of access to information
10
What DW can do
Track the most profitable customers and segments Regional trends on sales, profits & transactions Product profitability On the spot decisions
11
DW Benefits
Helps in efficiently converting huge stacks of data into information and further into better business decision making. Develop applications quickly on changing needs ensuring highest returns on your investments. Analyzing daily sales information. Competitive edge to Company.
12
DW Terminology
OLTP Online Transaction Processing (Data capture Screens) OLAP Online Analytical Processing (Reporting)
ROLAP (Relational), MOLAP (Multi Dimensional)
Transformation Process of changing the OLTP data into OLAP information DATA MART .. Is a data structure that is optimized for access. It supports a single analytic application. METADATA.. The information about the data which is stored as part of the DW is called Meta data. In other words, the results of data modeling activity when stored in a tool or repository is called Metadata CUBE .. Central object of your data containing information in a multidimensional structure. Each cube is defined by a set of dimensions and measures
13
14
15
OLTP v/s OLAP OLTP system runs the business, Data Warehouses tell you how to run the business
Characteristic Orientation Data Access Updates Response time OLTP Transaction Record at a time Frequent & Unscheduled Seconds required OLAP Analysis Set at a time Periodic & Scheduled Minutes acceptable
Data nature
Current 16
historical
Data Scrubbing & Cleansing Process of filtering, merging, standardizing, initializing and translating the operational data in order to create informational data that can be stored in Datawarehouse To ensure the Data quality, accuracy
17
18
Steps DW Building
Identify key business drivers, sponsorship, risks, ROI Survey information needs and identify desired functionality and define functional requirements for initial subject area. Architect long-term, data warehousing architecture Evaluate and Finalize DW tool & technology Conduct Proof-of-Concept
Cont.
19
Steps DW Building
Design target data base schema Build data mapping, extract, transformation, cleansing and aggregation/summarization rules Build initial data mart, using exact subset of enterprise data warehousing architecture and expand to enterprise architecture over subsequent phases Maintain and administer data warehouse
20
DW Process Overview
Business Intelligence
Query & Reporting
Legacy
Staging Area
Data Warehouse
OLAP
Billing
ODS
Customer
E T L
DW
Customer Service
Marketing
Product
OLTP METADATA
22
Star Schema
A central fact table surrounded by a number of dimension tables. Dimensions are business entities on which calculations are done. They can be numeric or alphanumeric. Example: Product table comprising brand name, category, packaging type, size.
Facts are numerical measurements of business with respect to dimensions.They are numeric and additive (summable across any combination) e.g. A sales fact table could contain time, product and store key along with dollars sold, units sold, dollars cost.
23
Normalized version of the star schema with the addition of normalized dimension tables.
Normalization helps to reduce redundancy in the dimension tables, but affects performance and user comprehension.
24
Multidimensional Schemas
Star schema Snowflake schema
25
ETL Process
The process of extracting data from source systems and bringing it into the data warehouse is commonly called ETL, which stands for extraction, transformation, and loading.
The acronym ETL is perhaps too simplistic, because it omits the transportation phase and implies that each of the other phases of the process is distinct.
Refer to it as entire process,including data loading, as ETL. You should understand that ETL refers to a broad process, and not three well-defined steps.
26
27
ETL Tools
Major ETL Tools are: Informatica Power Mart Informatica Power Center DataStage Ab Initio DP Warehouse Oracle Express Data Mirror
28
OLAP Terminology
OLAP Tools available for exploring the information built in a DW :
Multi-dimensional On-line Analytical Processing (MOLAP) The data from data warehouse is queried and dumped periodically on to a server on local network to a data storage called Multi-dimensional Database (MDDB) provided by the OLAP tool. This MDDB forms a Data Mart which is then used for querying and reporting.
Relational On-Line Analytical Processing (ROLAP) Refers to the ability to conduct OLAP analysis directly against a relational warehouse without any constraints on the number of dimensions, database size, analytical complexity, or number and type of users.
Hybrid On-line Analytical Processing (HOLAP) An environment with a combination of MOLAP and ROLAP data storage. Summarized information is typically stored in an MDDB and detailed data is stored in a Relational environment.
29
MOLAP
high performance, multidimensional data storage format. data is stored on the OLAP server. gives the best query performance, for small to medium-sized data sets
30
ROLAP
remains in the original relational tables. A separate set of relational tables is used to store and reference aggregation data. ROLAP is ideal for large databases or legacy data that is infrequently queried.
31
HOLAP
HOLAP combines elements from MOLAP and ROLAP. HOLAP keeps the original data in relational tables but stores aggregations in a multidimensional format.
32
Location Month
Product
Analytical technique whereby the user navigates from the most summarized to the most detailed level.
33
M O N T H PRODUCT
P R O D U C T Region
34
OLAP Tools
Querying & Reporting
Oracle Discoverer Business Objects Brio Enterprise Oracle Express Hyperion Essbase Cognos
35
THANK YOU
2004 Infosys Technologies Ltd. Private and confidential. All rights reserved.