Sie sind auf Seite 1von 3


DW Mantra - Concept

Data Warehouse
Data warehouse is an architecture for organizing information system. It is a process for building decision support systems and knowledge
management enviroment that supports both day-to-day tactical decision making and long-term business strategies. Bill Inmon "Subjectoriented, integrated, time variant, non-volatile collection of data in support of management's decision making process."

Data Mart
A data mart is a collection of subject areas organized for decision support system based on the needs of a given department. Typically, the
database design for a data mart is built as star schema structure that is optimal for the needs of the users found in the department. There are two
kinds of data marts - dependent and independent. A dependent data mart is one whose source is a data warehouse. An independent data mart is
one whose source is legacy applications or OLTP environment.
Operational Data Store(ODS)
An operational data store is an integrated, subject-oriented, volatile(including update/deletion), current valued structure designed to serve
operational users as they do high performance integrated processing.
OLTP(Online Transaction Processing)>
OLTP is a class of program that facilitates and manages transaction-oriented applications, typically for data entry and retrieval transaction
processing. OLTP systems are optimized for data entry operations. e.g. Order Entry, Banking, CRM, ERP applications etc.
Data Warehouse vs Operational
Data Warehouse


Subject oriented

Application oriented

Summarized, rened & detailed Detailed

Represents value over time

Accurate as of moment

Supports managerial needs

Supports day-to-day needs

Read only data

Can be updated

Batch processing

Real time transactions

Completely different life cycle

Software Development Life Cycle

Analysis driven

Transaction driven

Dimensional model

Entity Relational Diagram

Large amount of data

Small amount of data

Relaxed availability

High availability

Flexible structure

Static structure

DW Methodologies




Practitioner Bill Inmon

Ralph Kimball Many practitioners

Doug Hackney

Emphasize Data Warehouse

Data Marts

DW and data marts

Integrate heterogeneous BI


model of data

Start enterprise and

local models; one or

An achitecture of
architectures; share

Enterprise based
normalized model;



DW Mantra - Concept

marts use a subject

orient dimensional

mart, consists
star schema

more star schemas

dimensions, facts, rules,

denitions across


Multi-tier comprised of Staging area

High-level normalized Reality of change in
staging area and
and data marts enterprise model;
organizations and systems
dependent data marts
initial marts

Data set

DW atomic level data; Contains both

marts summary data
atomic and
summary data

Populates marts with Use of whatever means

atomic and summary possible to integrate
data via a nonbusiness needs
persistent staging area.

Agile Developement
Agile methodology emphasize close collaboration between the technical team and business experts; face-to-face communication; selforganizing teams; frequent delivery of business value releases.
A project's overall scope, objectives, constraints, clients, risks, etc. should be briey documented.
Lean, iterative, feature-driven, time-boxed development cycles.
Constant feedback. Exploratory processes require constant feedback to stay on track.
Customer involvement. Focusing on business value requires constant interaction between customers and developers.
Technical excellence. Creating, refactoring and maintaining a technically excellent product.
3D Lifecycle
Dimensional Data Warehouse Development Lifecycle - Our approach Agile data warehouse development with integrating iterative and data
driven components. Enterprise data warehouse data model is suggested as dimensional with conformed subject areas. The goal of 3D
methodology is to dene strategies that enable data warehouse practitioners to work effectively on development and deliverables. This does not
mean "one size ts all" methodology. Instead, consider 3D life cycle as a collection of philosophies that will enable technical and business
experts to work together effectively to maximize ROI. 3DLC is an adaptable process framework, intended to be tailored by project teams that
will select the elements of the process that are appropriate for their needs.
i. Collaboration across technical and subject matter expertise teams.
ii. Iterative and incremental approach.
iii. Monthly releases, fully functional, set of building blocks.
iv. Small team size max up to 10 people.
v. Phase(Project) plan 4-6 months.
vi. Commitment to the team, Active participation.
vii. Build consensus and ownership, create win/win solution.
viii. Focus on quality, testing & communication.
Business Intelligence (BI)
Business Intelligence is a set of business processes for collecting and analying business information. BI functions include trend analysis,
aggregation of data, drilling down to complex levels of detail, slice-dice, data rotation for comparative viewing.
OLAP(On-Line Analytical Processing) Querying and presenting data from data warehouse exemplifying as multiple dimensions.
ROLAP(Relational OLAP) Applications and set of user interfaces that retrieve data from RDBMS and present as dimensional model.
MOLAP(Multidimensional OLAP) Applications, set of user interfaces and database technologies that have dimensional model.
DOLAP(Desktop OLAP) Designed for low-end, single user. Data is stored/downloaded on the desktop.
HOLAP(Hybrid OLAP) is a combination of all the above OLAP methodologies.
E. F. Codd(father of the relational database)'s 12 rules for OLAP


1. Multidimensional conceptual view. This supports EIS "slice-and-dice" operations and is usually required in nancial modeling.
2. Transparency. OLAP systems should be part of an open system that supports heterogeneous data sources. Furthermore, the end user
should not have to be concerned about the details of data access or conversions.
3. Accessibility. The OLAP should present the user with a single logical schema of the data.
4. Consistent reporting performance. Performance should not degrade as the number of dimensions in the model increases.
5. Client/server architecture. Requirement for open, modular systems.
6. Generic dimensionality. Not limited to 3-D and not biased toward any particular dimension. A function applied to one dimension should
also be able to be applied to another.
7. Dynamic sparse-matrix handling. Related both to the idea of nulls in relational databases and to the notion of compressing large les, a
sparse matrix is one in which not every cell contains data. OLAP systems should accommodate varying storage and data-handling
8. Multiuser support. OLAP systems, like EISes, need to support multiple concurrent users, including their individual views or slices of a
common database.
9. Unrestricted cross-dimensional operations. Similar to rule 6; all dimensions are created equal, and operations across data dimensions do
not restrict relationships between cells.
10. Intuitive data manipulation. Ideally, users shouldn't have to use menus or perform complex multiple-step operations when an intuitive
drag-and-drop action will do.
11. Flexible reporting. Save a tree. Users should be able to print just what they need, and any changes to the underlying nancial model
should be automatically reected in reports.
12. Unlimited dimensional and aggregation levels. A serious tool should support at least 15, and preferably 20, dimensions.
Extract clean Transform Load(ETL)
Data loading is a major process in data warehouse. It comprises 50% to 75% of any data warehousing effort. Effective ETL process represent




DW Mantra - Concept

main success of data warehouse project.

Extract - The process by which data is pulled or pushed from disparate source systems.
clean - After data is extracted from source the next step comes is to clean the bad data.
Transform - Based on the business rules data is transformed.
Load - Insert/move the transformed data into the dimension and fact tables in a data warehouse.

Data Mining
Generally speaking data mining is knowledge discovery process of analyzing data from different perspectives and categorize it into useful
information. Technically, data mining is the process of nding correlations or patterns across elds in large databases.
Meta Data Management
Data about data. Metadata describes the information stored in the data warehouse. It can be dened into three categories:Business meta data Descriptive information about the reports, transformations, tables, columns etc for the business users.
Technical meta data Physical characteristics of the data such as table/column names, data types, sources to targets mappings etc for the
Operational meta data Allows administrators to monitors operations and usage of the data warehouse. Like load times, history, data
quality information and usage of tables.
Copyright 2007 DW Mantra Inc. All rights reserved.