Sie sind auf Seite 1von 18

Data Warehousing/Mining

Introduction

Data Warehousing/Mining

Outline of Lecture

Brief History of Data Warehousing What is a Data Warehouse? Need For Strategic Information Information Crisis Operational and Decision Support System Difference B/W standard DB and Data warehouse

Data Warehousing/Mining

Data Warehouse Evolution


Relational Databases 1960 1975 Company DWs 1980 1985 Building the DW Inmon (1992) 1990 Data Replication Tools 1995 2000
TIME

Prehistoric Times

InformationMiddle Data Based Revolution Ages Management

PCs and Spreadsheets

End-user Interfaces

1st DW Article

DW Confs.

Vendor DW Frameworks
3

Data Warehousing/Mining

Escalating Need For Strategic Information


Organizations need information to formulate the business strategies,establish Goals,set Objectives e.g.

Increase the customer by 10% over the next 5 years Gain market share by 15% in the next 2 years Increase product quality levels in the top five product groups

Data Warehousing/Mining

The Information Crisis


Information is said to be doubled every 18 months Organizations have tons of data available Then why information Crisis? Why cant organizations convert the data into useful information for strategic decision making?

Data Warehousing/Mining

Problem: Heterogeneous Information Sources


Heterogeneities are everywhere Personal Databases

Scientific Databases

Data Warehousing/Mining

Different interfaces Different data representations Diverse structure of databases Duplicate and inconsistent information

Digital Libraries

World Wide Web

About Some Definitions

What is data? What is information? What is Warehouse?

Data Warehousing/Mining

What is a Data Warehouse?


A Practitioners Viewpoint

A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context. -- Barry Devlin, IBM Consultant

Data Warehousing/Mining

A Data Warehouse is...

Stored collection of diverse data


A solution to data integration problem Single repository of information

Subject-oriented
Organized by subject, not by application Used for analysis, data mining, etc.

Large volume of data (Gb, Tb) Non-volatile


Historical Time attributes are important

Data Warehousing/Mining

A Data Warehouse is... (continued)


Updates infrequent Examples


All transactions EVER at WalMart Complete client histories at insurance firm Stockbroker financial information and portfolios

Data Warehousing/Mining

10

Summary
Business Information Interface

Data Warehouse
Data Warehouse Population

Operational Systems
Data Warehousing/Mining 11

What is Operational and Decision Support System


Operational Systems Making the wheels of Business Turn
Take an order Process a claim Make shipment Generate an invoice Receive cash Reserve an airline seat

Data Warehousing/Mining

12

What is Operational and Decision Support System (Contd)


Decision Support System Watching the wheels of business turn
Show the top selling products Show the problem regions Tell me why (drill down) Let me see other data (drill across) Alert me when a district sells below target

Data Warehousing/Mining

13

Difference
Operational
Data Content Data Structure Access Frequency Access Type Usage Current Values Optimized for transaction High Read, update, delete Predictable, repetitive

Informational
Archived, derived, optimized Optimized for complex queries Medium to Low Read Ad hoc, random, Heuristic

Response Time Sub seconds Users


Data Warehousing/Mining

Several Seconds to Minutes


Relatively Small number
14

Large Number

Warehouse is a Specialized DB
Standard DB

Warehouse

Mostly updates Many small transactions Mb - Gb of data Current snapshot Index/hash on p.k. Raw data Thousands of users (e.g., clerical users)

Mostly reads Queries are long and complex Gb - Tb of data History Lots of scans Summarized, reconciled data Hundreds of users (e.g., decision-makers, analysts)

Data Warehousing/Mining

15

Warehousing and Industry

Warehousing is big business


$2 billion in 1995 $3.5 billion in early 1997 About $8 billion in 1998 [Metagroup]

WalMart has largest warehouse


900-CPU, 2,700 disk, 23 TB Teradata system ~7TB in warehouse 40-50GB per day

Data Warehousing/Mining

16

Data Warehousing: Two Distinct Issues


(1) How to get information into warehouse Data warehousing (2) What to do with data once its in warehouse Warehouse DBMS Both rich research areas Industry has focused on (2)

Data Warehousing/Mining

17

Thank You Very Much

Data Warehousing/Mining

18

Das könnte Ihnen auch gefallen