Sie sind auf Seite 1von 85

Business Analytics

Data Mart & Data Warehousing


(Bonus Material)

Pristine www.edupristine.com
Pristine
Data Mart & Data Warehousing

Data Warehousing

Data Acquisition (ETL)

ETL Concepts

Database Design

Pristine 1
Why should we consider Data Warehousing solutions?

Traditional approaches to computer system design during 1980's

Not optimized for analysis and reporting

Company wide reporting couldn't be supported from a single system

For developing reports often required writing specific computer programs which
was slow and expensive

Pristine 2
Why should we consider Data Warehousing solutions?

Traditional approaches to computer system design during 1980's

Not optimized for analysis and reporting

Company wide reporting couldn't be supported from a single system

For developing reports often required writing specific computer programs which
was slow and expensive

When users are requesting access to a large amount of historical


information for reporting purposes, you should strongly consider a
warehouse or mart. The user will benefit when the information is
organized in an efficient manner for this type of access.

Pristine 3
Data Warehousing

DWH is type of relational data base system specially designed for query analysis
processing rather than transactional processing.

Pristine 4
Data Warehousing

DWH is type of relational data base system specially designed for query analysis
processing rather than transactional processing.

The DWH systems are also called as Historical Db's, Read only Db's, Integrated Db's,
Decision Supporting System, Executive info System, Business Info System.

Pristine 5
Differences..

DWH database (OLAP) OLTP database

Designed for analysis of business Designed for real time business


measures by category and attributes. operations.

Pristine 6
Differences..

DWH database (OLAP) OLTP database

Designed for analysis of business Designed for real time business


measures by category and attributes. operations.

Optimized for a common set of


Optimized for bulk loads and large,
transactions, usually adding or
complex, unpredictable queries that
retrieving a single row at a time per
access many rows per table.
table.

Pristine 7
Differences..

DWH database (OLAP) OLTP database

Designed for analysis of business Designed for real time business


measures by category and attributes. operations.

Optimized for a common set of


Optimized for bulk loads and large,
transactions, usually adding or
complex, unpredictable queries that
retrieving a single row at a time per
access many rows per table.
table.
Optimized for validation of incoming
Loaded with consistent, valid data;
data during transactions; uses
requires no real time validation.
validation data tables.

Pristine 8
Differences..

DWH database (OLAP) OLTP database

Designed for analysis of business Designed for real time business


measures by category and attributes. operations.

Optimized for a common set of


Optimized for bulk loads and large,
transactions, usually adding or
complex, unpredictable queries that
retrieving a single row at a time per
access many rows per table.
table.
Optimized for validation of incoming
Loaded with consistent, valid data;
data during transactions; uses
requires no real time validation.
validation data tables.
Supports few concurrent users relative Supports thousands of concurrent
to OLTP. users.

Pristine 9
DWH Architecture

Three common architectures are:

DWH Architecture (Basic)

DWH Architecture (With a staging area)

DWH Architecture (With a staging area and data marts)

Pristine 10
DWH Architecture (Basic)

Data Sources Warehouse Users

Analysis

Operational
System

Metadata
Reporting

Summary Raw Data


Operational Data
System

Mining

Flat Files

Pristine 11
DWH Architecture (with a staging area)

Data Staging Warehouse Users


Sources Area
Analysis

Operational
System

Metadata
Reporting

Summary Raw Data


Operational Data
System

Mining

Flat Files

Pristine 12
DWH Architecture (with a staging area and data marts)

Data Staging Warehouse Data Users


Sources Area Marts

Operational
Purchasing
System Analysis

Metadata

Summary Raw Data


Operational Data
Sales
System Reporting

Flat Files
Inventory
Mining
Pristine 13
Dimensional Data Modeling

To develop a Star Schema design a Data Modeler follows dimensional modeling


design aspect.

Dimensional modeling is a 3 stage process

Conceptual modeling

Logical Modeling

Physical Modeling

Pristine 14
Before start implementing the schema design a Data modeler should understand the
following process

Understand the clients Business requirements

Understand the grain of fact

Designing of the Dimension tables

Designing of the Fact tables

Pristine 15
Example of Dimensional Data Model (Star Schema Design)

Product Dimension Organization Dimension


Product Dimension Identifier (PK) Organization Dimension Identifier (PK)
Product Category Name Corporate Office Name
Product Sub-Category Name Region Name
Product Name Branch Name
Product Feature Description Employee Name
Date Time Stamp Date Time Stamp
Sales Fact
Time Dimension Identifier (FK)
Time Dimension
Location Dimension Identifier (FK)
Product Dimension Identifier (FK) Time Dimension Identifier (PK)
Organization Dimension Identifier (FK) Year Number
Sales Dollar Day of Year
Date Time Stamp Quarter Number
Location Dimension
Month Number
Location Dimension Identifier (PK) Month Name
Country Name Month Day Number
State Name Week Number
County Name Day of Week
City Name Calendar Date
Date Time Stamp Date Time Stamp

Pristine 16
Fact Table

Contain numeric measures of the business

Contains facts and connected to dimensions

Two types of columns

Facts or measures

Foreign keys to dimension tables

May contain date-stamped data

A fact table might contain either detail level facts or facts that have been
aggregated

Pristine 17
Steps in designing Fact Table
Identify a business process for
analysis (like sales).
Identify measures or facts
(sales dollar).
Identify dimensions for facts
(product dimension, location
dimension, time dimension,
organization dimension).
List the columns that describe
each dimension. (region
name, branch name, region
name).
Determine the lowest level
of summary in a fact table
(sales dollar).

Pristine 18
Dimension Tables
Contain textual information
Example of Location Dimension
that represents attributes of
the business Country Lookup
Contain relatively static data Country Code (PK)
Country Name
Are joined to fact table Date Time Stamp
through a foreign key
reference State Lookup
State Code (PK) Location Dimension
Are usually smaller than fact
State Name
tables Location Dimension Identifier (PK)
Date Time Stamp
Country Name
State Name
County Lookup County Name
County Code (PK) City Name
County Name Date Time Stamp
Date Time Stamp

City Lookup
City Code (PK)
City Name
Date Time Stamp

Pristine 19
Location Dimension

Location Country County Date Time


State Name City Name
Dimension Id Name Name Stamp

1/1/2005
1 USA New York Shelby Manhattan
11:23:31 AM
1/1/2005
2 USA Florida Jefferson Panama City
11:23:31 AM
1/1/2005
3 USA California Montgomery San Hose
11:23:31 AM
1/1/2005
4 USA New Jersey Hudson Jersey City
11:23:31 AM

Pristine 20
Star Schema Design benefits

Easy for users to understand

Fast response to queries

Support multi dimensional analysis

Supported by many front end tools

Pristine 21
Star Schema Design benefits

Easy for users to understand

Fast response to queries

Support multi dimensional analysis

Supported by many front end tools

Other schema designs are also in practice viz. Clickstar, Snowflakes etc.

Pristine 22
Data Warehouse Star Schema

In contrast, this data model better supports the ease of developing reports and
simple, efficient summarization queries

Customers

Dates Channels

Sales

Promotions Products

Pristine 23
Data Acquisition

It is the process of extracting the relevant business info/- from the different source
systems transforming the data from one format into an another format, integrating
the data in to homogeneous format and loading the data in to a warehouse database.

Pristine 24
ETL

Pristine 25
Data Acquisition

It is the process of extracting the relevant business info/- from the different source
systems transforming the data from one format into an another format, integrating
the data in to homogeneous format and loading the data in to a warehouse
database.

Data Extraction (E)

Data Transformation (T)

Data Loading (L)

Pristine 26
Extraction, Transformation, and Loading (ETL) Processes

The "plumbing" work of data warehousing

Data are moved from source to target data bases

A very costly, time consuming part of data warehousing

Pristine 27
Sample ETL Process Flow

Data from Various sources


Relational Database

Source
System = 1

Source
System = 2

Source
System = 3

File source

Text to come
Excel(.xls) files
Text to come
Text(.txt) files
XML(.xml) files
and other files

Other source

SAP
People Soft
Siebel & several

Pristine 28
Sample ETL Process Flow

Data from Various sources ETL Process


Relational Database Designing ETL Process
Creating Source and
Source Target Repositories.
System = 1 Mapping Source and
Target Repositories.

Source Designing ETL Process


System = 2 Data Cleansing

Data Profiling
Source
System = 3 File source
Aggregation
File source
Filtering

Joining
Text to come
Excel(.xls) files
Text to come
Text(.txt) files Sorting
XML(.xml) files
and other files
Loading
Creation and execution of
Other source Workflows to load data from
source to target
SAP
People Soft etc.
Siebel & several

Pristine 29
Sample ETL Process Flow

Data from Various sources ETL Process Target


Relational Database Designing ETL Process Data Warehouse
Creating Source and
Source Target Repositories.
System = 1 Mapping Source and Data Mart 1
Target Repositories.

Source Designing ETL Process


System = 2 Data Cleansing Data Mart 2

Data Profiling
Source
System = 3 File source
Data Mart 3
Aggregation
File source
Filtering

Joining
Text to come
Excel(.xls) files
Text to come
Text(.txt) files Sorting
XML(.xml) files
and other files
Loading
Creation and execution of
Other source Workflows to load data from
source to target
SAP
People Soft etc.
Siebel & several

Pristine 30
Data Sources and Types

Primarily from legacy, operational systems

Almost exclusively numerical data at the present time

External data may be included, often purchased from third-party sources

Technology exists for storing unstructured data and expect this to become more
important over time

Pristine 31
ETL Process

The ETL Process having the following basic steps

Is mapping the data between source systems and target database

Is cleansing of source data in staging area

Is transforming cleansed source data and then loading into the target system

Pristine 32
Data Staging

Often used as an interim step between data extraction and later steps

Accumulates data from asynchronous sources using native interfaces, flat files, FTP
sessions, or other processes

At a predefined cutoff time, data in the staging file is transformed and loaded to the
warehouse

There is usually no end user access to the staging file

An operational data store may be used for data staging

Pristine 33
Data Transformation

Transforms the data in accordance with the business rules and standards that have
been established

Example include: format changes, deduplication, splitting up fields, replacement of


codes, derived values, and aggregates

Pristine 34
Data Loading

Data are physically moved to the data warehouse

The loading takes place within a "load window"

The trend is to near real time updates of the data warehouse as the warehouse is
increasingly used for operational applications

Pristine 35
Meta Data

Data about data

Needed by both information technology personnel and users

IT personnel need to know data sources and targets; database, table and column
names; refresh schedules; data usage measures; etc.

Users need to know entity/attribute definitions; reports/query tools available; report


distribution information; help desk contact information, etc.

Pristine 36
OLAP is a Data Warehouse Tool

Online analytical processing (OLAP) is a technology designed to provide superior


performance for ad hoc business intelligence queries.

OLAP organizes data warehouse data into multidimensional cubes based on this
dimensional model, and then preprocesses these cubes to provide maximum
performance for queries that summarize data in various ways.

Pristine 37
OLAP is a Data Warehouse Tool

Online analytical processing (OLAP) is a technology designed to provide superior


performance for ad hoc business intelligence queries.

OLAP organizes data warehouse data into multidimensional cubes based on this
dimensional model, and then preprocesses these cubes to provide maximum
performance for queries that summarize data in various ways.

OLAP is not designed to store large volumes of text or binary data, nor is it designed
to support high volume update transactions.

The inherent stability and consistency of historical data in a data warehouse enables
OLAP to provide its remarkable performance in rapidly summarizing information for
analytical queries.

Pristine 38
Data Mining is a Data Warehouse Tool

Data mining is a technology that applies sophisticated and complex algorithms to


analyze data and expose interesting information for analysis by decision makers.

OLAP organizes data in a model suited for exploration by analysts, and data mining
performs analysis on data and provides the results to decision makers.

Thus, OLAP supports model-driven analysis and data mining supports data-driven
analysis.

Pristine 39
Conclusion

The key to data warehousing is data design.

Focus on the users, determine what data is needed, locate sources for the data, and
organize the data in a dimensional model that represents the business needs.

Pristine 40
Database & Design An introduction

Pristine 41
Spreadsheet vs. Database
A spreadsheet is a way of describing a table of numeric data, and having some of that data
interact.

A spreadsheet is like an accountant's ledger. It has columns and rows. You can enter bits of
information (typically numbers) directly into a 'cell' that can be identified by its column and row
number. You can then manipulate the data and possibly reach some conclusions concerning what
you've entered. For examples addition of columns or rows, averages, multiplication, etc. The
electronic spreadsheet is a very useful and powerful method of analysis.

Pristine 42
Spreadsheet vs. Database
A spreadsheet is a way of describing a table of numeric data, and having some of that data
interact.

A spreadsheet is like an accountant's ledger. It has columns and rows. You can enter bits of
information (typically numbers) directly into a 'cell' that can be identified by its column and row
number. You can then manipulate the data and possibly reach some conclusions concerning what
you've entered. For examples addition of columns or rows, averages, multiplication, etc. The
electronic spreadsheet is a very useful and powerful method of analysis.

A database is a means of storing a lot of information.

The database is a compilation of like things. Names, birthdays, weights, ages, etc. If, for example, you
had 100 friends, you could use a database to enter various information about each of them. Name,
age, address, birthday, height, weight, favorite color, whatever information you think useful or
interesting. Then you could 'sort' that information based on 'filters' that you think important or
interesting. For example you could rank them by age or height or weight.

Pristine 43
Approach to Database design

To define the Scope as the Area of Interest, (e.g., the HR Department in an


organization).

To define the "Things of Interest", (e.g., Employees), in the Area of Interest.

To analyze the Things of Interest and identify the corresponding Tables.

To Consider cases of 'Inheritance', where there are general Entities and Specific
Entities.
For example, a Customer is a General Entity, and Commercial Customer and Personal Customer
would be Specific Entities.

At this point, a List of Things of Interest can be produced

To establish the relationships between the Tables.


For example, "A Customer can place many Orders", and "A Product can be purchased many
times and appear in many Orders."

Pristine 44
Approach to Database design

To determine the characteristics of each Table, (e.g., an Employee has a Date-of-


Birth).

To identify the Static and Reference Data, such as Country Codes or Customer
Types.

To obtain a small set of Sample Data


e.g., "John Doe is a Maintenance Engineer and was born on 1st. August, 1965 and lives at 22
Woodland Street, New Haven".
"He is currently assigned to maintenance of the Air-Conditioning and becomes available in 4
weeks time"

To review Code or Type Data which is (more or less) constant, which can be
classified as Reference Data.
For example, Currency or Country Codes. Where possible, use standard values, such as ISO
Codes.

To look for 'has a' relationships. These can become Foreign Keys, or 'Parent-Child'
relationships.

Pristine 45
Approach to Database design

To define a Primary Key for all Tables.


For Reference Tables, use the 'Code' as the Key, often with only one other field, which is
the Description field. Typically the names of Reference Data Tables all start with 'REF_'.

To confirm the first draft of the Database design against the Sample Data.

To review and obtain from the Users some representative enquiries for the
Database,
e.g., "How many Maintenance Engineers do we have on staff coming available in the next
4 weeks ?"

Review the Results of Steps 1 to 9 with appropriate people


e.g., Users, Managers, Development staff, etc. and repeat until the final Database design is
reached.

Define User Scenarios and step through them with some sample data to check that
Database supports the required functionality.

Pristine 46
Data Models, Schemas, and Instances

Data types
Relationships

Data Model: A set of concepts to describe the structure of a database, and certain
constraints that the database should obey.

Provide data abstraction

Pristine 47
Data Models, Schemas, and Instances

Data types
Relationships

Data Model: A set of concepts to describe the structure of a database, and certain
constraints that the database should obey.

Provide data abstraction

Data Model Operations: Operations for specifying database retrievals and updates
by referring to the concepts of the data model.
Generic operation: insert, delete, modify, retrieve
User-defined operations

Pristine 48
Categories of Data Models
Conceptual (high-level, semantic) data models: Provide concepts that are close to the way many
users perceive data. (Also called entity-based or object-based data models.)
Entity
Attribute
Relationship
Physical (low-level, internal) data models: Provide concepts that describe details of how data is
stored in the computer.
Record formats
Record ordering
Access paths
Implementation (record-oriented) data models: Provide concepts that fall between the above
two, balancing user views with some computer storage details.
Relational
Network
Hierarchical

Pristine 49
Schemas, Instances and Database State

Database Schema (meta-data): The description of a database. Includes descriptions


of the database structure and the constraints that should hold on the database.

Schema Diagram: A diagrammatic display of (some aspects of ) a database schema.

Database Instance: The actual data stored in a database at a particular moment in


time. Also called database state (or occurrence, snapshot)

Each schema construct has its own current set of instances.


The database schema changes very infrequently. The database state changes every
time the database is updated. Schema is also called intension, whereas state is called
extension.

Pristine 50
Schema diagram for University database

schema construct

Known data:
Name of record types, data items

Pristine 51
Student Name Student Number Class Major
Smith 17 1 CS
Brown 8 2 CS
Course Course Name Course Number Credit Hours Department
Intro to Computer Science CS1310 4 CS
Data Structures CS3320 4 CS
Discrete Mathematics MATH2410 3 MATH
Database CS3380 3 CS
Section Section Identifier Course Number Semester Year Instructor
85 MATH2410 Fall 98 King
92 CS1310 Fall 98 Anderson
102 CS3320 Spring 99 Knuth
112 MATH2410 Fall 99 Chang
119 CS1310 Fall 99 Anderson
135 CS3380 Fall 99 Stone
Grade Report Student Number Section Identifier Grade
17 112 B
17 119 C
8 85 A
8 92 A
8 102 B
8 135 A
Prerequisite Course Number Prerequisite Number
CS3380 CS3320
CS3380 MATH2410
CS3320 CS1310
Pristine 52
DBMS Architecture and Data Independence

Defines DBMS schema at three levels:


Internal schema at the internal level to describe data storage structures and access
paths. Typically uses a physical data model.
Conceptual schema at the conceptual level to describe the structure and
constraints for the whole database. Uses a conceptual or an implementation data
model.
External schema at the external level to describe the various user views. Usually
uses the same data model as the conceptual level or high-level data model.

Mappings among schema levels are also needed. Programs refer to an external
schema, and are mapped by the DBMS to the internal schema for execution

Pristine 53
The Three-schema architecture

Pristine 54
DBMS Interfaces

Stand-alone query language interfaces. (casual end user)


Programmer interfaces for embedding DML in programming languages: (programmer)
Pre-compiler Approach
Procedure (Subroutine) Call Approach
User-friendly interfaces:
Menu-based Interfaces for Browsing.
Forms-based Interfaces.
Graphical User Interfaces.
Natural language Interfaces
Combination of the above
Interfaces for Parametic Users (using function keys)
Interfaces for the DBA:
Creating accounts, granting authorizations
Setting system parameters
Changing schemas or access path

Pristine 55
The Database System Environment
DBMS Component Modules

Pristine 56
Database System Utilities

To perform certain functions such as:

Loading data stored in files into a database. Conversion tool

Backing up the database periodically on storage.

File reorganizing database file structures.

Report generation utilities.

Performance monitoring utilities.

Other functions, such as sorting, user monitoring, data compression, etc.

Pristine 57
Tools, Application Environments, and Communications Facilities

Data dictionary utility:


Used to store schema descriptions and other information such as design decisions,
application program descriptions, user information, usage standards, etc.
Active data dictionary is accessed by DBMS software and users/DBA.
Passive data dictionary is accessed by users/DBA only.

Communications Facilities
Allow users at locations remote from the database system site to access the
database.
DB (DBMS)/DC (Data Communication System)

Pristine 58
2.5 Classification of Database Management Systems

Based on the data model used:


Data models
Traditional: Relational, Network, Hierarchical
Emerging: Object-oriented, Semantic, Entity- Relationship, other.
Other classifications:
Number of users: Single-user (typically used with personal computers) vs. multi-user
(most DBMSs)
Number of sites: Centralized (uses a single computer) vs. distributed (uses multiple
computers). Homogeneous vs. Heterogeneous
Cost of DBMS software. $10, 000~100, 000
$100~3, 000
Types of access paths used. (inverted file structures, )
Purpose general purpose
special purpose
e.g., airline reservations, telephone directory, on-line transaction
processing system

Pristine 59
A Network Schema

Student Course

Is A
Course Offerings Has A

Student Grades Section Prerequisite

Section Grades

Grade Report

Pristine 60
Database Management System (DBMS)

Collection of interrelated data


Set of programs to access the data
DBMS contains information about a particular enterprise
DBMS provides an environment that is both convenient and efficient to use.
Database Applications:
Banking: all transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Manufacturing: production, inventory, orders, supply chain
Human resources: employee records, salaries, tax deductions
Databases touch all aspects of our lives

Pristine 61
An architecture for a database system

View level

View 1 View 2 ... View n

Logical level

Physical level

Pristine 62
Data Models
A collection of tools for describing
Data
Data relationships
Data semantics
Data constraints
Entity-Relationship model
Relational model
Other models:
Object-oriented model
Semi-structured data models
Older models: network model and hierarchical model

Pristine 63
Entity-Relationship Model

Example of schema in the entity-relationship model

Customer name Customer street Account number Balance

Customer id Customer city

Customer Depositor Account

Pristine 64
Relational Model

Example of tabular data in the relational model Attributes

Customer ID Customer Name Customer Street Customer City Account Number


192-83-7465 Johnson Alma Palo Alto A-101
019-28-3746 Smith North Rye A-215
192-83-7465 Johnson Alma Palo Alto A-201
321-12-3123 Jones Main Harrison A-217
019-28-3746 Smith North Rye A-201

Pristine 65
A Sample Relational Database

Customer ID Customer name Customer street Customer city


192-83-7465 Johnson 12 Alma St. Palo Alto
019-28-3746 Smith 4 North St. Ray
677-89-9011 Hayes 3 Main St. Harrison
182-73-6091 Turner 123 Putnam Ave. Stamford
321-12-3123 Jones 100 Main St. Harrison
336-66-9999 Lindsay 175 Park Ave. Pittsfield
019-28-3746 Smith 72 North St. Ray
( a ) The Customer Table

Account number Balance Customer ID Account number


A-101 500 192-83-7465 A-101
A-215 700 019-28-3746 A-215
A-102 400 677-89-9011 A-102
A-305 350 182-73-6091 A-305
A-201 900 321-12-3123 A-201
A-217 750 336-66-9999 A-217
A-222 700 019-28-3746 A-222
( b ) The Account Table ( C ) The Depositor Table

Pristine 66
Entity Relationship Model (Cont.)

E-R model of real world

Entities (objects)

e.g., customers, accounts, bank branch

Relationships between entities

e.g., Account A-101 is held by customer Johnson

Relationship set depositor associates customers with accounts

Widely used for database design

Database design in E-R model usually converted to design in the relational model
(coming up next) which is used for storage and processing

Pristine 67
Relational Database

A relational database is a collection of data items organized as a set of formally


described tables from which data can be accessed easily.

Pristine 68
Relational Database

A relational database is a collection of data items organized as a set of formally


described tables from which data can be accessed easily.

A relational database is created using the relational model. The software used in a
relational database is called a relational database management system (RDBMS).

Pristine 69
Relational Database

A relational database is a collection of data items organized as a set of formally


described tables from which data can be accessed easily.

A relational database is created using the relational model. The software used in a
relational database is called a relational database management system (RDBMS).

A relational database is the predominant choice in storing data, over other models
like the hierarchical database model or the network model. It consists of n number
tables and each table has its own primary key.

Pristine 70
Relational database example

Contain tables
Tables contain records (rows)
Records are broken into columns (fields)

PK ID Quote FK Sources
1 I don't like that man; I must get to know him better. 4

2 I wish I had an answer to that because I'm tired of 1


answering that question.
3 Right is right, even if everyone is against it, and wrong 3
is wrong, even if everyone is for it.
4 People are just as happy as they make up their minds 4
to be.

Pristine 71
Overview of Object-Oriented Concepts

Object-oriented databases give designer to specify


The structure of complex objects
The operations that can be applied to objects

Object
State(Value)
Behavior(operations)
Transient vs. persistent

Pristine 72
Overview of Object-Oriented Concepts (Cont.)

Maintain a direct correspondence between real-world and database objects


A real-world object may have different names for key attributes in different
relations in traditional database systems
e.g., EMP_ID, SSN in different relations

OODBs provide a unique system-generated object identifier for each object

Pristine 73
Overview of Object-Oriented Concepts (Cont.)

Objects may have an object structure of arbitrary complexity


Information about a complex object is often scattered over many relations or
records in traditional database systems
1NF in relational databases

Pristine 74
Object Identity

Unique identity for each independent object stored in the database

Created by a unique, system-generated object identifier, or OID

Pristine 75
Object Identity (Cont.)

Properties of OID

Immutable: the OID value of a particular object should not change

OID should not depend on any Physical address


Attribute values of the object

Each OID is used only once.

Most OO database systems allow for the representation of


both objects and values (having no OIDs)

Pristine 76
Object Structure

(i, c, v)

i: a unique object identifier (OID)

c: a type constructor

Basic type: atom


System supports
Structured type: tuple (integer, real, string, Boolean, )

Collection type: array vs. list, set vs. bag


Order Unorder

Number of Distinct vs.


elements duplicate

Pristine 77
Example 1: Complex Object

Pristine 78
Transaction Management

A transaction is a collection of operations that performs a single logical function in a


database application

Transaction-management component ensures that the database remains in a


consistent (correct) state despite system failures (e.g., power failures and operating
system crashes) and transaction failures.

Concurrency-control manager controls the interaction among the concurrent


transactions, to ensure the consistency of the database.

Pristine 79
Storage Management

Storage manager is a program module that provides the interface between the low-
level data stored in the database and the application programs and queries
submitted to the system.

The storage manager is responsible to the following tasks:


Interaction with the file manager
Efficient storing, retrieving and updating of data

Pristine 80
Overall System Structure

Pristine 81
Application Architectures

Two-tier architecture: e.g., client programs using ODBC/JDBC to communicate with


a database

Three-tier architecture: e.g., web-based applications, and applications built using


"middleware"

Pristine 82
Planning Overall

What do I need to plan for?

People, hardware, software, obsolescence, maintenance, emergencies.

How far out do I need to plan?

Initially 2-4 years.

How often do I need to review the plans?

Annually.

What if my plan fails or looks undoable?

Nip it in the bud, be proactive, come up with options.

Pristine 83
Thank you!

Pristine
702, Raaj Chambers, Old Nagardas Road, Andheri (E), Mumbai-400
069. INDIA
www.edupristine.com
Ph. +91 22 3215 6191
Pristine www.edupristine.com
Pristine 84

Das könnte Ihnen auch gefallen