Sie sind auf Seite 1von 48

Data Management

Data
Data A necessity for almost any enterprise to carry out its business. Consists of raw facts, and when organized may be transformed into information. Database A collection of data organized to meet users needs
Database Management System (DBMS) A group of programs that manipulate the database and provide an interface between the database and the user of the database or other application programs.

Brief History of Data Management


Early DBMSs (late 1960s) evolved from filebased processing systems Visualize the data much as it was stored
Tree-based (hierarchical model) Graph-based (network model)

DEPTS EMPS NAME SS# ITEMS MGR

Advent of Modern DBMS


Early 1970s Ted Codd invented new data model (=relational data model) and the concept of data abstraction Soon thereafter, team of IBMers invented SQL (Structured Query Language)
Became de-facto standard for query languages based on the relational data model

Commercial DBMS based on relational model are now widely accepted in industry
e.g., Microsoft Access, Oracle 9i, Sybase Adaptive Server, >20 billion dollar industry!

Data Hierarchy in a Computer System

File Management Systems: a physical interface

The

Traditional Approach Separate Course files are Data created and stored for each Lecturer application Data program.

Student Data

Student Admin

Year List

Scheduler

Timetable

Payroll

Cheques

File Management Systems: Sharing

Student Data

Student Admin

Year List

Course Data

Scheduler

Timetable

Lecturer Data

Payroll

Cheques

Problems with the Traditional File Environment

Data redundancy Program-Data dependence Lack of flexibility Poor security Lack of data-sharing and availability

The Contemporary Database Environment

The Database Approach A pool of related data is shared by multiple application programs. Rather than having separate data files, each application uses a collection of data that is either joined or related in the database.

Database Management System (DBMS)

Creates and maintains databases Eliminates requirement for data definition statements Acts as interface between application programs and physical data files Separates logical and design views of data

Data Base Structures


Hierarchical and Network DBMS Relational DBMS Object-Oriented Databases Object-

Hierarchical Database Model


Hierarchical Database Model A data model in which the data is organized in a top-down, or inverted tree structure.

A Network Data Model


Network Data Model An expansion of the hierarchical database model with an owner-member relationship in which a member may have many owners.

A Relational Data Model


Relational Data Model All data elements are placed in two-dimensional tables, called relations, that are the logical equivalent of files.

Types of Databases
Centralized database Used by single central processor or multiple processors in client/server network Distributed database Stored in more than one physical location Partitioned database Duplicated database

(Analytical Database)
Multidimensional data analysis Supports manipulation and analysis of large volumes of data from multiple dimensions/perspectives

Operational Databases
Data bases store detailed data needed to support the operations of the entire organizations . Also called Subject area database (SADB),transaction data base,Production databases or personal databases

The Web and Hypermedia database


Organizes data as network of nodes Links nodes in pattern specified by user Supports text, graphic, sound, video and executable programs

Characteristics of a Database
Structure data types data behavior Persistence store data on secondary storage Retrieval a declarative query language a procedural database programming language

Performance retrieve and store data quickly


Correctness

Sharing concurrency Reliability and resilience Large volumes

Designing Databases

Conceptual design: Abstract model of database


from a business perspective

Physical design: Detailed description of


business information needs

Data Modeling and Database Models


Data Model A map or diagram of entities and their relationships. Enterprise data modeling Data modeling done at the level of the entire organization. Entity-Relationship (ER) diagrams A data model that uses basic graphical symbols to show the organization of and relationships between data.

Data Entities, Attributes, and Keys


Entity A generalized class of people, places, or things (objects) for which data is collected, stored, and maintained. Attribute A characteristics of an entity; something the entity is identified by. Keys A field or set of fields in a record that is used to identify the record.
Entities Customer, Employee Attributes Customer name, Employee name

Primary key A field or set of fields that uniquely identifies the record.

An Entity-Relationship Diagram

The Use of Schemas and Subschema's


Schema A description of the entire database. Subschema A file that contains a description of a subset of the database and identifies which users can perform modifications on the data items in that subset.

21

Management Requirements for Database Systems

Key elements in a database environment:


Data Administration Data Planning and Modeling Methodology Database Technology and Management Users

Management Requirements for Database Systems

Advanced Databases Data Warehousing

todays Problem: Heterogeneous Information Sources

Heterogeneities are everywhere


Personal Databases

Scientific Databases

World Wide Web


Digital Libraries

Different interfaces Different data representations Duplicate and inconsistent information

Our Goal: Unified Access to Data

Integration System

World Wide Web


Digital Libraries Scientific Databases

Personal Databases

Collects and combines information Provides integrated view, uniform user interface Supports sharing

Data warehouse Evolution


Relational Databases 1960 1975 Company DWs 1980 1985 Building the DW Inmon (1992) 1990 Data Replication Tools 1995 2000

Prehistoric Times

InformationData Middle Based Revolution Ages Management

PCs and Spreadsheets

End-user Interfaces

1st DW Article

DW Confs.

Vendor DW Frameworks

What is a Data Warehouse?


A Data Warehouse is a
subject-oriented, integrated, time-variant, non-volatile

collection of data used in support of management decision making processes.

Subject-Oriented:
The data warehouse is organized around the key subjects (or high-level entities) of the enterprise. Major subjects include
Customers, Patients,Students,Products etc .

Integrated
The data housed in the data warehouse are defined using consistent
Naming conventions Formats Encoding Structures Related Characteristics

Time-variant
The data in the warehouse contain a time dimension so that they may be used as a historical record of the business

Non-volatile
Data in the data warehouse are loaded and refreshed from operational systems, but cannot be updated by end-users

The Data Warehouse Continued


Characteristics of data warehousing are:
Time variant. The data are kept for many years so they can be used for trends, forecasting, and comparisons over time. Nonvolatile. Once entered into the warehouse, data are not updated. Relational. Typically the data warehouse uses a relational structure. Client/server. The data warehouse uses the client/server architecture mainly to provide the end user an easy access to its data. Web-based. Data warehouses are designed to provide an efficient computing environment for Web-based applications

Data Warehouse- A Practitioners Viewpoint A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context. -- Barry Devlin, IBM Consultant

Warehousing and Industry


Warehousing is big business
$2 billion in 1995 $3.5 billion in early 1997 Predicted: $8 billion in 1998 [Metagroup]

WalMart has largest warehouse


900-CPU, 2,700 disk, 23 TB Teradata system ~7TB in warehouse 40-50GB per day

Warehouse is a Specialized DB
Standard DB
Mostly updates Many small transactions Mb - Gb of data Current snapshot Index/hash on p.k. Raw data Thousands of users (e.g., clerical users)

Data Warehouse
Mostly reads Queries are long and complex Gb - Tb of data History Lots of scans Summarized, reconciled data Hundreds of users (e.g., decision-makers, analysts)

Position of the Data Warehouse Within the Organization

The Data Warehouse Architecture

The Data Mart


A data mart is a small scaled-down version of a data warehouse designed for a strategic business unit (SBU) or a department. Since they contain less information than the data warehouse they provide more rapid response and are more easily navigated than enterprise-wide data warehouses.

There are two major types of data marts:


Replicated (dependent) data marts are small subsets of the data warehouse. In such cases one replicates some subset of the data warehouse into smaller data marts, each of which is dedicated to a certain functional area. Stand-alone data marts. A company can have one or more independent data marts without having a data warehouse. Typical data marts are for marketing, finance, and engineering applications.

Position of the Data Mart Within the Organization


Decision Support Information

Data Mart

Data Delivery Data Delivery

Data Mart

Decision Support Information

Data Mart

Decision Support Information

Data Warehousing: Two Distinct Issues (1) How to get information into warehouse
Data warehousing

(2) What to do with data once its in warehouse


Warehouse DBMS

Both rich research areas Industry has focused on (2)

What Can a Data Warehouse Do?


Some of the benefits of a DW are: Immediate information delivery Data integration from across and even outside the organization Future vision from historical trends Tools for looking at data in new ways Freedom from IS department resource limitations (you dont need programmers to use a data warehouse)

Examples of Common DW Applications


Determine real-time product sales to make vital pricing and distribution decisions. Analyze historical product sales to determine success or failure attributes. Evaluate successful products and determine key success factors. Quickly isolate past preferred customers who no longer buy. Identify daily what product is in the manufacturing and distribution pipeline. Instantly determine which salespeople are performing, on both a revenue and margin basis, and which are behind. Compare actual to budgets on an annual, monthly and month-to-date basis. Review past cash flow trends and forecast future needs. Instantly generate a current set of key financial ratios and indicators. Receive near-real-time, interactive financial statements.
Human Resource Analysis Financial Analysis Sales Analysis

Evaluate trends in benefit program use. Identify the wage and benefits costs to determine company-wide variation. Warehouses have also been applied to areas such as: logistics, inventory, purchasing, detailed transaction analysis and load balancing.

Other Areas

The Future of Data Warehousing


Typical Nonintegrated Information Architecture
i2 Supply Chain Oracle Financials Siebel CRM 3rd Party Data

Supply Chain Data Mart

Oracle Financial DW

Marketing DW

Subset Non-Architected Data Marts

Federated Integrated Information Architecture


i2 Supply Chain Oracle Financials Siebel CRM 3rd Party Data

Common Data Staging Area Federated Supply Chain Data Mart

Federated Financial DW

Federated Marketing DW

Subset Non-Architected Data Marts

Das könnte Ihnen auch gefallen