Sie sind auf Seite 1von 51

M. Sc.

(CS/IT) Part I
Paper IV
Data Warehousing and Mining

Text Books: Paulraj Ponnian, Data Warehousing Fundamentals, John Wiley. W.H. Inmon, Building the Data Warehouses, Wiley Dreamtech R. Kimpall, The Data Warehouse Toolkit, John Wiley Ralph Kimball, The Data Warehouse Lifecycle toolkit, John Wiley
Girish Tere, Lecturer (CS), TCSC 1

3/16/2014

The need for DW

Understand the desperate need for strategic information Recognize the information crises at every enterprise Distinguish between operational and informational systems Past attempts to provide strategic information The solution Data Warehousing
Girish Tere, Lecturer (CS), TCSC 2

3/16/2014

Introduction

What is your role in IT? Your IT experience Applications to run business What they do? What they provide? What executives requires? Where is the strategic information required?
Girish Tere, Lecturer (CS), TCSC 3

3/16/2014

Organizations use of DW

Retail

Manufacturing

Customer Loyalty Market Planning

Cost Reduction Logistics Management


Asset Management Resource Management Manpower Planning Cost Control

Financial

Utilities

Risk Management Fraud Detection

Airlines

Government

Root Profitability Yield Managemnt

3/16/2014

Girish Tere, Lecturer (CS), TCSC

Understand the desperate need for strategic information

Who needs strategic information in an Enterprise? What is strategic information? Examples of Business Objectives

Retain the present customer base Increase the customer base by 15% over the next 5 years Gain market share by 10% in next 3 years
Girish Tere, Lecturer (CS), TCSC 5

3/16/2014

Examples of Business Objectives (cont)

Improve product quality levels in the top five product groups Enhance customer service level in shipments Bring three new products to market in 2 years Increase sales by 15% in the North East Division
Girish Tere, Lecturer (CS), TCSC 6

3/16/2014

Strategic Information (SI)

Is it for running the day-to-day operation of the business? What is SI? Characteristics of SI

3/16/2014

Girish Tere, Lecturer (CS), TCSC

Characteristics of SI
Integrated Must have a single, enterprisewide view

Data Integrity
Accessible

Credible

Timely

Information must be accurate and must conform to business rules Easily accessible with intuitive access paths, and responsive for analysis Every business factor must have unique value Information must be available within the stipulated time period

The Information Crisis


How much data is stored and available? Where is all this data? On which platforms? On one PC or across the network? Facts are Organization have lots of data IT resources and systems are not affective to use this data as SI
Girish Tere, Lecturer (CS), TCSC 9

3/16/2014

Real Problem
Most companies are faced with information crisis not because of lack of sufficient data, but because the available data is not readily usable for strategic decision making. Why is this so? We need information integrated from all systems. Operational data is event driven Operational data is not directly suitable for review from different viewpoints

3/16/2014 Girish Tere, Lecturer (CS), TCSC 10

Technology Trends

Name of Computer Department in Company DP, MIS, IS, IT Phenomenon growth of IT in areas like

Computing Technology Human/Machine Interface Processing Options

What technology SI needs?


Girish Tere, Lecturer (CS), TCSC 11

3/16/2014

Technology Trends (cont)

The user will ask a question and get the results This interactive process continues Why making provision of SI is feasible now?

3/16/2014

Girish Tere, Lecturer (CS), TCSC

12

Opportunities and Risks

What are the opportunities available to companies resulting from the possible use of SI? What are threats and risks resulting from lack of SI available in companies?

3/16/2014

Girish Tere, Lecturer (CS), TCSC

13

Some Opportunities

SI required for Reliance Telecommunication industry SI required for ICICI Bank SI required for Mediclaim companies SI required for Apna Bazar A Community based pharmacy company

3/16/2014

Girish Tere, Lecturer (CS), TCSC

14

Some Risks

A car rental company (fleet management) A multinational company - Supplier of systems and components to automobile industry (Inconsistent data)

3/16/2014

Girish Tere, Lecturer (CS), TCSC

15

Failures of past DSS

Example A Chennai Branch is not You have to gather the data from multiple applications and start from scratch. In order to understand the reasons for the failures of IT to provide SI in the past, we need to consider how IT was attempting to do this all these years.
Girish Tere, Lecturer (CS), TCSC 16

3/16/2014

Past DSSs

Ad- Hoc reports Special Extract Programs Small applications Information Centers DSS EIS (only programmed screens and reports were available)
Girish Tere, Lecturer (CS), TCSC 17

3/16/2014

Inability to provide information

Figure 1.4 IT receives too many ad hoc requests, resulting in a large overload. Requests keep changing Users ask for more and more reports Users have to depend on IT to provide the information You need very flexible and conductive environment for providing info for making strategic decisions. IT has been unable to provide such an environment.
Girish Tere, Lecturer (CS), TCSC 18

3/16/2014

Operational vs DSS

What is the basic reason for the failure of all the previous attempts by IT to provide SI? Do we need different types of systems?

3/16/2014

Girish Tere, Lecturer (CS), TCSC

19

Making the wheels of Business Turn


OLTP Systems Used to run the day-to-day core business of company

3/16/2014

Girish Tere, Lecturer (CS), TCSC

20

Get the data in


Making the wheels of business turn


Take an order Process a claim Make a shipment Generate an invoice Receive cash Reserve an Airline ticket

3/16/2014

Girish Tere, Lecturer (CS), TCSC

21

Get the information out


Watching the wheels of business turn Show me the top-selling products Show me the problem regions Tell me why (drill down) Let me see other data (drill across) Show me highest margins Alert me when a district sells below target
Girish Tere, Lecturer (CS), TCSC 22

3/16/2014

We need to design and build informational systems


That serve different purposes Whose scopes are different Whose data content is different Where the data usage patterns are different Where the data access types are different
Girish Tere, Lecturer (CS), TCSC 23

3/16/2014

M. Sc. (CS/IT) Part I


Paper IV
Data Warehousing and Mining

Text Books: 1. Paulraj Ponnian, Data Warehousing Fundamentals, John Wiley. 2. W.H. Inmon, Building the Data Warehouses, Wiley Dreamtech 3. R. Kimpall, The Data Warehouse Toolkit, John Wiley 4. Ralph Kimball, The Data Warehouse Lifecycle toolkit, John Wiley
Girish Tere, Lecturer (CS), TCSC 24

3/16/2014

The need for DW

Understand the desperate need for strategic information Recognize the information crises at every enterprise Distinguish between operational and informational systems Past attempts to provide strategic information The solution Data Warehousing
Girish Tere, Lecturer (CS), TCSC 25

3/16/2014

Introduction

What is your role in IT? Your IT experience Applications to run business What they do? What they provide? What executives requires? Where is the strategic information required?
Girish Tere, Lecturer (CS), TCSC 26

3/16/2014

Organizations use of DW

Retail

Manufacturing

Customer Loyalty Market Planning

Cost Reduction Logistics Management


Asset Management Resource Management Manpower Planning Cost Control

Financial

Utilities

Risk Management Fraud Detection

Airlines

Government

Root Profitability Yield Managemnt

3/16/2014

Girish Tere, Lecturer (CS), TCSC

27

Understand the desperate need for strategic information

Who needs strategic information in an Enterprise? What is strategic information? Examples of Business Objectives

Retain the present customer base Increase the customer base by 15% over the next 5 years Gain market share by 10% in next 3 years
Girish Tere, Lecturer (CS), TCSC 28

3/16/2014

Examples of Business Objectives (cont)

Improve product quality levels in the top five product groups Enhance customer service level in shipments Bring three new products to market in 2 years Increase sales by 15% in the North East Division
Girish Tere, Lecturer (CS), TCSC 29

3/16/2014

Strategic Information (SI)

Is it for running the day-to-day operation of the business? What is SI? Characteristics of SI

3/16/2014

Girish Tere, Lecturer (CS), TCSC

30

Characteristics of SI
Integrated Data Integrity Accessible Must have a single, enterprisewide view Information must be accurate and must conform to business rules Easily accessible with intuitive access paths, and responsive for analysis Every business factor must have unique value Information must be available within the stipulated time period

Credible Timely

The Information Crisis


How much data is stored and available? Where is all this data? On which platforms? On one PC or across the network? Facts are Organization have lots of data IT resources and systems are not affective to use this data as SI
Girish Tere, Lecturer (CS), TCSC 32

3/16/2014

Real Problem
Most companies are faced with information crisis not because of lack of sufficient data, but because the available data is not readily usable for strategic decision making. Why is this so? We need information integrated from all systems. Operational data is event driven Operational data is not directly suitable for review from different viewpoints

3/16/2014 Girish Tere, Lecturer (CS), TCSC 33

Technology Trends

Name of Computer Department in Company DP, MIS, IS, IT Phenomenon growth of IT in areas like

Computing Technology Human/Machine Interface Processing Options

What technology SI needs?


Girish Tere, Lecturer (CS), TCSC 34

3/16/2014

Technology Trends (cont)

The user will ask a question and get the results This interactive process continues Why making provision of SI is feasible now?

3/16/2014

Girish Tere, Lecturer (CS), TCSC

35

Operational and Informational Systems


Data Content Data Structure Access Frequency Access Type Usage Response Time Users
3/16/2014

Current values Optimized for transactions High Read, update, delete Predictable, Repetitive msecs Large numbers
Girish Tere, Lecturer (CS), TCSC

Archived, derived, summarized Optimized for complex queries Medium to low Read Ad hoc, random, heuristic Many seconds Relatively small numbers
36

DW The correct solution

We need different types of DSS to provide SI Information required for strategic decision making is not available in operational systems New environment is required for analysis, deciding trends and monitoring performance
Girish Tere, Lecturer (CS), TCSC 37

3/16/2014

Features of new environment :

Database designed for analytical tasks Data from multiple applications Easy to use and helping to long interactive sessions by users Read-intensive data usage Direct interaction with the system by the users without help from IT staff Content updated periodically and stable Content to include current and historical data Ability for users to run queries and get results online Ability for users to make reports
Girish Tere, Lecturer (CS), TCSC 38

3/16/2014

Processing requirements in the new environment (analytical processing requirements)

Running of simple queries and reports against current and historical data Ability to perform what if analysis Ability to query, analyze and again make query continue this process as many as times required Realize historical trends, mistakes and apply/correct them for future results
Girish Tere, Lecturer (CS), TCSC 39

3/16/2014

BI at DW

The needed environment is DW It is kept separate from the system environment supporting the day-to-day operations DW contains BI.

3/16/2014

Girish Tere, Lecturer (CS), TCSC

40

Basic business process Operational Systems

Data transformation

Key measurements, business dimensions

Extraction, Cleansing, aggregation

Data Warehouse

3/16/2014

Girish Tere, Lecturer (CS), TCSC

41

E.g. of BI at DW

DW containing units of sales stored along business dimensions Important : Data staging area

3/16/2014

Girish Tere, Lecturer (CS), TCSC

42

Definition of DW DW is an informational environment that

Provides an integrated and total view of the enterprise Makes the enterprises current and historical information easily available for decision making Makes decision-support transactions possible without burdening operational systems Renders consistently organizations information Presents a flexible and interactive source of strategic information
Girish Tere, Lecturer (CS), TCSC 43

3/16/2014

DW concept

Is not to generate fresh data Is to make use of large existing data and to transform it into forms suitable for providing SI Take all the data you already have in the organization, clean and transform it, and then use it to provide SI
Girish Tere, Lecturer (CS), TCSC 44

3/16/2014

DW An Environment, Not a Product

It is a user-centric and user-driven environment An ideal environment for data analysis and decision support Constantly changing, flexible and interactive Useful for the ask-answer-ask-again pattern Provides the ability to discover answers to complex, unpredictable questions
Girish Tere, Lecturer (CS), TCSC 45

3/16/2014

The basic concept of DW is:


Take all the data from the operational systems Where necessary, include relevant data from outside, such as industry benchmark indicators Integrate all the data from the various sources Remove inconsistencies and transform the data Store the data in formats suitable for easy access for decision making 3/16/2014 Girish Tere, Lecturer (CS), TCSC 46

DW involves following functions


Data extraction Loading the data Transforming the data Storing the data Providing UI

3/16/2014

Girish Tere, Lecturer (CS), TCSC

47

Technologies used in DW

Data Quality

Data Modeling Data Acquisition Data Management Metadata Management Analysis Applications Development Tools Storage Management

Administration

3/16/2014

Girish Tere, Lecturer (CS), TCSC

48

Match the columns


1.

2.
3. 4. 5. 6. 7. 8.

9.
10.

information crisis SI operational systems information center DW order processing EIS data staging area extract programs IT

A.

B.
C. D. E.

F. G. H. I. J.

OLTP application Produce ad hoc reports explosive growth despite lots of data data cleaned and transformed users go to get information used for decision making environment, not product for day-to-day operations Simple, easy to use
49

3/16/2014

Girish Tere, Lecturer (CS), TCSC

Class Test
1. 2.

3.

4. 5.

6.
7.

What do you mean by SI? For a commercial bank, name five types of strategic objectives. Do you agree that a typical retail store collects huge volumes of data through its operational systems? Name three types of transaction data likely to be collected by a retail store in large volumes during its daily operations. Why were all the past attempts by IT to provide SI failures? List three concrete reasons and explain. Differentiate between operational systems and informational systems. List characteristics of the computing environment needed to provide SI. What types of processing take place in a DW? A DW is an environment, not a product. Discuss.
Girish Tere, Lecturer (CS), TCSC 50

3/16/2014

Class Test (cont)


8.

9.

You are the IT Director of a nationwide insurance company. Write a memo to the VP explaining the types of opportunities that can be realized with What do you mean by SI? For a commercial bank, name five types of strategic objectives. For an airlines company, how can SI increase the number of frequent flyers? Discuss giving specific details.

3/16/2014

Girish Tere, Lecturer (CS), TCSC

51

Das könnte Ihnen auch gefallen