Sie sind auf Seite 1von 40

Dataware House Concepts

Topics
Business Intelligence Components of Business Intelligence How BI help Companies? Data Warehouse concepts

Business Intelligence
What is Business Intelligence? BI refers to technologies, applications for collection, integration, analysis & presentation of Business Information. How different it is from OLTP Systems? OLTP systems are designed for Day to Day operations, while BI applications are for strategic decision making.

Components of Business Intelligence


Data Warehouse Data Marts OLAP (Reports & Dash Boards) Operational Data Store Data Mining

How BI help Companies?


Historical Analysis of Data (Trend Analysis) Predictive Analysis & Planning Churn Management Strategic Decision Making Single view of Customer Cross Sell/Up-Sell Finding out relation between products, fraud management (Using Data Mining) Etc..

Data Warehouse
Data Warehouse is a collection of integrated, subject oriented database designed to support the DSS function, where each unit is relevant to some moment in time. Goals of Data Warehousing: Easy Accessibility of Information Present organizations information consistently Adaptive and resilient to change Secure bastion that protects our information assets Serve as a foundation for improved decision making

Characteristics of DW
Subject Oriented Integrated Non Volatile Time Variant (explicit dependence on time)

OLTP and OLAP


OLTP: Online Transaction Processing (OLTP) OLTP systems were built to automate business transactions. A focus on bookkeeping functions. Applications were built along functional lines Historical data was typically not needed or retained.

OLTP and OLAP


OLAP: Standard reporting Ad-hoc query and reporting Multidimensional analytical reporting Predictive analysis and planning

Dimensional Modeling
Dimensional Modeling is a logical design technique for Data warehousing aimed at easier access of data and business representation of data. Features of Dimensional Model: De-normalized Structures Database structured for faster and easier querying Stress is on easier interpretation of data Dimensional Databases occupy extra space compared to equivalent ER model because of redundancy Consists of Fact & Dimension Tables

Dimension tables
Dimension tables contain the details about the business entities such as customer, product, etc. This enables the business users to better understand the data and their reports. Since the data in a dimension table is demoralized, it typically has a large number of columns. The attributes in a dimension table are typically used as row and column headings in a report or query results display. Arrange members into hierarchies or levels

Dimension tables

Fact table
A fact table consists of the measurements, metrics or facts of a business process. It is often located at the centre of a star schema or a snowflake schema, surrounded by dimension tables. A typical fact table contains numeric facts and foreign keys that references dimension tables.

Star Schema
Star schema is combination of fact table and several dimension tables. Each Dimension has foreign key relationship with fact table. Such an arrangement in the dimensional model looks like a star formation, with the fact table at the core of the star and the dimension tables along the spikes of the

Star Schema

Snowflake Schema Design


Dimension table hierarchies are broken into simpler tables

In few organizations, they try to normalize the dimension tables to save space
Both Fact and Dimensional tables are Normalized Increases the number of joins and poor performance in retrieval of data May become large and unmanageable Degrades query performance

Snowflake Schema Design

DA
It is the process of extracting the relevant business info/- from the different source systems transforming the data from one format into an another format, integrating the data in to homogeneous format and loading the data in to a warehouse database. Data Extraction (E) Data Transformation (T) Data Loading (L)

Sample ETL Process Flow


Step 1: Select the Business Process Step 2: Declare the Grain. Step 3: Identify the Facts Step 4: Choose the Dimensions

ETL Process
The ETL Process having the following basic steps

Is mapping the data between source systems and target database


Is cleansing of source data in staging area Is transforming cleansed source data and then loading into the target system

ETL Process
Source System A database, application, file, or other storage facility from which the data in a data warehouse is derived. Mapping The definition of the relationship and data flow between source and target objects. Staging Area A place where data is processed before entering the warehouse. Cleansing The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.

ETL Process
Transformation The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources. Transportation The process of moving copied or transformed data from a source to a data warehouse. Target System A database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.

Important aspects of Star Schema & Snow Flake Schema


In a star schema every dimension will have a primary key. In a star schema, a dimension table will not have any parent table. Whereas in a snow flake schema, a dimension table will have one or more parent tables. Hierarchies for the dimensions are stored in the dimensional table itself in star schema. Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies.

Designing a Dimension Model


Step 1: Select the Business Process Step 2: Declare the Grain. Step 3: Identify the Facts Step 4: Choose the Dimensions

Slowly-changing Dimensions
When the DW receives notification that some record in a dimension has changed, there are three basic responses:
Type 1 slow changing dimension (Correction of Errors) Type 2 slow changing dimensions(Preservation of History) Type 3 slow changing dimensions (Alternate Realities)

Slowly-changing Dimensions
Type 1 Slowly Changing Dimension (Overwrite)
Overwrite one or more values of the dimension with the new value Use when the data are corrected there is no interest in keeping history there is no need to run previous reports or the changed value is immaterial to the report Type 1 Overwrite results in an UPDATE SQL statement when the value changes

Slowly-changing Dimensions
Type-2 Slowly Changing Dimension (Preservation of History) Standard When a record changes, instead of overwriting create a new dimension record with a new surrogate key add the new record into the dimension table use this record going forward in all fact tables no fact tables need to change no aggregates need to be re-computed Types of Type 2

Flag based (Active/Inactive) Version based (1,2,3..) Start Date & End Date based

Slowly-changing Dimensions
Type-3 Slowly Changing Dimensions (Alternate Realities/Soft Changes) Applicable when a change happens to a dimension record but the old record remains valid as a second choice Product category designations Sales-territory assignments Instead of creating a new row, a new column is inserted (if it does not already exist) The old value is added to the secondary column Before the new value overrides the primary column Example: old category, new category

DW Tools
Vendor SAP Oracle Microsoft Informatics ETL OLAP SAP BW/BODI Business Objects Oracle BW Hyperion/Seibel Analytics SQL Server SSIS SQL Server SSRS Informatics Power Center Power Analyzer

IBM

Data Stage

Cognos8

DW Project Lifecycle
Project Planning Business Requirement Definition Technical Architecture Design Dimensional Modeling Physical Design Data Staging Design and Development (ETL) Analytic Application Specification Design and Development (OLAP) Testing and Production Deployment Maintenance

Dimensions
Types of Dimensions:
Junk Dimension Confirmed Degenerate Dimension Slowly Changing Dimensions

Junk Dimension
A junk dimension is a convenient grouping of typically lowcardinality flags and indicators. By creating an abstract dimension, these flags and indicators are removed from the fact table while placing them into a useful dimensional framework. A Junk Dimension is a dimension table consisting of attributes that do not belong in the fact table or in any of the existing dimension tables. The nature of these attributes is usually text or various flags, e.g. nongeneric comments or just simple yes/no or true/false indicators.

Degenerate Dimensions
A dimension key, such as a transaction number, invoice number, ticket number, or bill-of-lading number, that has no attributes and hence does not join to an actual dimension table. Degenerate dimensions are very common when the grain of a fact table represents a single transaction item or line item because the degenerate dimension represents the unique identifier of the parent. Degenerate dimensions often play an integral role in the fact table's primary key.

Degenerate Dimensions
"A degenerate dimension is data that is dimensional in nature but stored in a fact table. "Any values in the fact table that dont join to dimensions are either considered degenerate dimensions or measures." "A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a separate dimension table. "A degenerate dimension acts as a dimension key in the fact table but does not join a corresponding dimension table because all its interesting attributes have already been placed in other analytic dimensions."

Types Fact tables And Facts


Facts: Additive Facts Semi Additive Facts Non-Additive Facts

Types Fact tables And Facts


Facts: Additive Facts Semi Additive Facts Non-Additive Facts

Fact tables
Transaction Fact Table Factless Fact Table Snapshot Fact Table

E/R Modeling
E/R modeling is a design technique in which we store the data in highly normalized form inside a relational database. Features of ER model: ER model is highly normalized Stress is on optimization of OLTP transaction

ER Model for Retail SalesPromotion

Answer the following Queries


Total Sales for Product Total Sales by Store Total Sales by Country Total Sales by Year Total Sales by Qtr

Das könnte Ihnen auch gefallen