Olap & Oltp - Updated

DATA, DATABASE, DATA
WAREHOUSE - OLTP & OLAP

Introduction
HISTORY
Database
1960 - the first database

management system
1970 - the first relational
model
1980 - distributed database
systems
and
database
machines
1990
object-oriented
databases
2000 - XML database
Data Warehouse
It became a distinct type of
computer database during the
late 1980s and early 1990s
INTRODUCTION
Database
collection of related data
database management system (DBMS) is a collection of
programs that enables users to create and maintain a
database
used in many applications
Data warehouse
a record of an enterprise's past transactional and operational

information
designed to favor efficient data analysis and reporting
data warehousing is not meant for current "live" data
CONTD...
Database
a structured collection of records or data
Data Warehouse
a logical collection of information, gathered from many different
operational databases, that supports business analysis activities and
decision-making tasks
Database models
is the structure or format of a database, described in a formal language
supported by the database management system
THE RELATIONAL DATABASE MODEL
There are many types of databases

Databases are
Collections of information
Created with logical structures
With logical ties within the information
With built-in integrity constraints
Databases have many tables
Consider Solomon Enterprises that provides concrete to home and
commercial builders. Tables or files include:
Order
Customer
Concrete Type
Employee
Truck
The relational database model is the most popular
Relational database uses a series of logically related twodimensional tables or files to store information in the form of a
database
DATABASE COLLECTION OF INFORMATION
3-6
DATABASES CONTD
In databases, the row number is irrelevant

In databases, column names are very important. Column names
are created in the data dictionary
Data dictionary contains the logical structure of the information
in a database
Logical ties must exist between the tables or files in a database
Logical ties are created with primary and foreign keys
Primary key field (or group of fields in some cases) that uniquely
describes each record
Foreign key primary key of one file that appears in another file
Foreign keys help you create logical ties within the information
in a database
Integrity constraints rules that help ensure the quality of the
information
Examples
Primary keys must be unique
Foreign keys must be present
Sales price cannot be negative
Phone number must have area code
EXAMPLE: DATABASES WITH LOGICAL TIES WITHIN

THE INFORMATION
DATABASE MANAGEMENT SYSTEM TOOLS
Database management system (DBMS) helps to

specify
the logical organization of a database

access and use the information within a database
5 software components:
1.
2.
3.
4.
5.
DBMS engine
Data definition subsystem
Data manipulation subsystem
Application generation subsystem
Data administration subsystem
CONTD
DBMS
DBMS engine accepts logical requests from the various other

DBMS subsystems, converts them into their physical equivalent,
and actually accesses the database and data dictionary as they
exist on a storage device
With a database, you only concern yourself with your logical view
Data definition subsystem helps you create and maintain the

data dictionary and define the structure of the files in a database
DBMS engine separates the logical from the physical
Physical view how information is physically arranged, stored,

and accessed on some type of storage device
Logical view how you as a knowledge worker need to arrange
and access information
You must create a data dictionary before entering information into a

database
Data manipulation subsystem helps you add, change, and

delete information
This is your primary DBMS interface as you work with a database

Views, Report generators, QBE tools, SQL
CONTD
View allows you to see the contents of a database file

Make whatever changes you want
Perform simple sorting
Query to find the location of information
Looks similar to a workbook with no row numbers
CONTD
Report generator helps you quickly define formats of

reports and what information you want to see in a report
You can save report formats and generate reports at any time with
up-to-date information
CONTD
Query-by-example (QBE) tool helps you graphically design

the answer to a question
What driver most often delivers concrete to Triple A Homes?
CONTD
Structured query language (SQL) standardized fourthgeneration language found in most DBMSs
Performs the same task as a QBE tool
But uses a sentence structure instead of point-and-click
interface
SQL is used mostly by IT people
Application generation subsystem contains facilities to
help you develop transaction-intensive applications
Data entry screen (called forms)
Programming languages
Used mostly by IT specialists
WHY SEPARATE DATA WAREHOUSE?
High performance for both systems
DBMS tuned for OLTP: access methods,

concurrency control, recovery
indexing,
Warehousetuned for OLAP: complex OLAP queries,

multidimensional view, consolidation
Different functions and different data:
missing data: Decision support requires historical data

which operational DBs do not typically maintain
data consolidation: DS requires consolidation (aggregation,

summarization) of data from heterogeneous sources
data quality: different sources typically use inconsistent data

representations, codes and formats which have to be
reconciled
DATA WAREHOUSE
W.H.Immon - Father of the data warehouse

Data Warehouse(Definition): A subject-oriented, integrated, time-
Data Warehousing: process of constructing and using a data
variant, non-updatable collection of data used in support of management

decision-making processes
warehouse
Subject-Oriented
Organized around major subjects, such as customer, product, sales

Focusing on the modeling and analysis of data for decision makers, not
on daily operations or transaction processing
Provide a simple and concise view around particular subject issues by
excluding data that are not useful in the decision support process
Integrated
Constructed by integrating multiple, heterogeneous data sources
Data cleaning and data integration techniques are applied
Ensure consistency in naming conventions, encoding structures,
attribute measures, etc. among different data sources
E.g., Hotel price: currency, tax, breakfast covered, etc.
When data is moved to the warehouse, it is converted
CONTD
Time Variant
Every piece of data contained within the warehouse must be associated
with a particular point in time if any useful analysis is to be conducted
with it.
Another aspect of time variance in DW data is that, once recorded, data
within the warehouse cannot be updated or changed.
Non-volatality
Typical activities such as deletes, inserts, and changes that are
performed in an operational application environment are completely
nonexistent in a DW environment.
Only two data operations are ever performed in the DW: data loading
and data access.
Collection of tools - gathering data; cleansing, integrating, querying, reporting,
analysis, data mining, monitoring, administering warehouse
Data-mining tools software tools used to query information in a data

warehouse
Query-and-reporting tools
Intelligence agents
Multidimensional analysis tools
Statistical tools
COMPONENTS OF A DATA WAREHOUSE
Sources - Data Source Interaction

Data Transformation
Data Warehouse (Data Storage)
Reporting (Data Presentation)
Metadata
10
NEED FOR DATA WAREHOUSING

Integrated, company-wide view of high-quality
information (from disparate databases)
Separation of operational and informational systems
and data (for improved performance)
WHAT IS A DATA WAREHOUSE?
11
WHAT ARE DATA-MINING TOOLS?

3-23
CONTD
Query-and-reporting tools
similar to QBE tools, SQL, and report generators in the typical
database environment
Intelligent Agents
Use various artificial intelligence tools such as neural networks
and fuzzy logic to form the basis for information discovery and
building business intelligence
Help you find hidden patterns in information
Multidimensional analysis (MDA) tools

Slice-and-dice
techniques
that
allow
you
to
multidimensional information from different perspectives
Bring new layers to the front
Reorganize rows and columns
view
Statistical Tools
Help you apply various mathematical models to the information
stored in a data warehouse to discover new information
Regression, Analysis of variance
12
DATA WAREHOUSE-ADVANTAGES & DISADVANTAGES

Advantages
complete control over the four main areas of data management
systems:
Clean data
Query processing: multiple options
Indexes: multiple types
Security: data and access
Disadvantages
Adding new data sources takes time and associated high cost
Data owners lose control over their data, raising ownership, security
and privacy issues
Long initial implementation time and associated high cost
Difficult to accommodate changes in data types and ranges, data source
schema, indexes and queries
DATA WAREHOUSE ARCHITECTURES

1.
2.
3.
4.
5.
Generic Two-Level Architecture

Independent Data Mart
Dependent Data Mart and Operational Data
Store
Logical Data Mart and active Warehouse
Three-Layer architecture
All involve some form of extraction, transformation and
loading (ETL)
13
Generic two-level data warehousing architecture
One,
companywide
warehous
e
Periodic extraction data is not completely current

in warehouse
27
Independent data mart data warehousing architecture

Data marts:
Mini-warehouses, limited in scope
T
E
Separate ETL for each independent
data mart
Data access complexity due

to multiple data marts
28
14
Dependent data mart with operational data store: a three-level

ODS provides option for
architecture
obtaining current data
T
E
Simpler data access
Single ETL for

enterprise data warehouse
(EDW)
Dependent data marts

loaded from EDW
29
Logical data mart and real time warehouse architecture

ODS and data warehouse
are one and the same
T
E
Near real-time ETL for
Data Warehouse
Data marts are NOT separate

databases, but logical views of the 30
data warehouse
Easier to create new data marts
15
DATA MINING AND VISUALIZATION

Knowledge discovery using a blend of statistical,
AI, and computer graphics techniques
Goals:
Explain observed events or conditions

Confirm hypotheses
Explore data for new or unexpected relationships
Techniques
Case-based reasoning
Rule discovery
Signal processing
Neural nets
Fractals
Data visualization representing data

graphical/ multimedia formats for analysis
in
DATA MARTS
Data warehouses can support all of an organizations

information
Data marts have subsets of an organization wide data
warehouse
Data mart subset of a data warehouse in which only a
focused portion of the data warehouse information is kept
16
BUSINESS INTELLIGENCE
Organizations need business intelligence

Business intelligence (BI) knowledge about
customers,
competitors,
business
partners,
competitive environment, and internal operations to
make effective, important, and strategic business
decisions
IT tools help process information to create business
intelligence according to:
OLTP
OLAP
OLTP VS. OLAP
Online transaction processing (OLTP)
gathering of input information, processing that

information, and updating existing information to reflect
the gathered and processed information
Online analytical processing (OLAP)
manipulation of information to support decision making

Databases can support some OLAP
Data warehouses only support OLAP, not OLTP
Data warehouses are special forms of databases that support
decision making
17
EXAMPLE
OLTP VS. OLAP

Online Transaction Processing (OLTP)
On Line Analytical Processing (OLAP)
Describes processing at operational sites Describes processing at warehouse

Relational databases - groups data using o Objectives are different
common attributes found in the data set
Designed for real time business o Designed for analysis of business
operations
measured by categories and attributes
Mostly updates
o Mostly reads
Many small transactions
o Queries are long and complex
Mb - Gb of data
o Gb - Tb of data
Optimized for a common set of o Optimized for bulk loads and large,
transactions, usually adding or retrieving
complex, unpredictable queries that
a single row at a time per table
access many rows per table
Optimized for validation of incoming o Loaded with consistent, valid data;
data during transactions; uses validation
requires no real time validation
data tables
Supports thousands of concurrent o Supports few concurrent users relative
to OLTP
users
18
WHAT AND WHY OLAP?
OLAP is the dynamic synthesis, analysis, and consolidation of

large volumes of multi-dimensional data.
OLAP uses multi-dimensional view of aggregate data to provide
quick access to strategic information for the purposes of
advanced analysis.
OLAP enables users to gain a deeper understanding and
knowledge about various aspects of their corporate data
through fast, consistent, interactive access to a variety of
possible views of data.
While OLAP systems can easily answer who? and what?
questions, it is easier ability to answer what if? and why? type
questions that distinguishes them from general-purpose query
tools.
The types of analysis available from OLAP range from basic
navigation and browsing (referred to as slicing and dicing) , to
calculations, to more complex analysis such as time series and
complex modeling.
OLAP KEY FEATURES
Multi-dimensional views of data.
Support for complex calculations.
Time Intelligence.
19
OLAP BENEFITS
Increased productivity of business end-users, IT developers,

and consequently the entire organization.
Reduced backlog of applications development for IT staff by
making end-users self-sufficient enough to make their own
schema changes and build their own models.
Retention of organizational control over the integrity of
corporate data as OLAP applications are dependent on data
warehouses and OLTP systems to refresh their source data
level.
Reduced query drag and network traffic on OLTP systems
or on the data warehouse.
Improved potential revenue and profitability by enabling
the organization to respond more quickly to market
demands.
REPRESENTATION OF MULTI-DIMENSIONAL DATA
OLAP database servers use multi-dimensional structures to

store data and relationships between data.
Multi-dimensional structures are best-visualized as cubes of
data, and cubes within cubes of data. Each side of a cube is a
dimension.
OLA
P, by
Dr.
Khali
l
20
CONTD
The cube can be expanded to include another dimension, for example, the
number of sales staff in each city.
The response time of a multi-dimensional query depends on how many
cells have to be added on-the-fly.
41
Multi-dimensional databases are a compact and easy-to-understand way

of visualizing and manipulating data elements that have many interrelationships.
As the number of dimensions increases, the number of cubes cells

increases exponentially.
OLA
P, by
Dr.
Khali
l
CONTD
Multi-dimensional OLAP supports common analytical

operations, such as:
Consolidation: involves the aggregation of data such
as roll-ups or complex expressions involving
interrelated data. Foe example, branch offices can be
rolled up to cities and rolled up to countries.
Drill-Down: is the reverse of consolidation and
involves displaying the detailed data that comprises
the consolidated data.
Slicing and dicing: refers to the ability to look at the
data from different viewpoints. Slicing and dicing is
often performed along a time axis in order to analyze
trends and find patterns.
21
OLAP TOOLS - CATEGORIES

OLAP tools are categorized according to the
architecture used to store and process multidimensional data.
There are four main categories of OLAP tools as
defined by Berson and Smith (1997) and Pends and
Greeth (2001) including:
Multi-dimensional OLAP (MOLAP)
Relational OLAP (ROLAP)
Hybrid OLAP (HOLAP)
Desktop OLAP (DOLAP)
MULTI-DIMENSIONAL OLAP (MOLAP)
MOLAP tools use specialized data structures and multi-dimensional database

management systems (MDDBMS) to organize, navigate, and analyze data.
To enhance query performance the data is typically aggregated and stored

according to predicted usage.
MOLAP data structures use array technology and efficient storage techniques
that minimize the disk space requirements through sparse data management.
The development issues associated with MOLAP:

Only a limited amount of data can be efficiently stored and analyzed.
Navigation and analysis of data are limited because the data is designed
according to previously determined requirements.
MOLAP products require a different set of skills and tools to build and
maintain the database.
22
RELATIONAL OLAP (ROLAP)
ROLAP is the fastest-growing type of OLAP tools.
ROLAP supports RDBMS products through the use of a metadata layer.
This facilitates the creation of multiple multi-dimensional views of the two-dimensional

relation.
45
To improve performance, some ROLAP products have enhanced SQL engines to support the
complexity of multi-dimensional analysis, while others recommend, or require, the use of
highly denormalized database designs such as the star schema.
The development issues associated with ROLAP technology:

Performance problems associated with the processing of complex queries that require
multiple passes through the relational data.
Development of middleware to facilitate the development of multi-dimensional
applications.
Development of an option to create persistent multi-dimensional structures, together
with facilities o assist in the administration of these structures.
OLA
P, by
Dr.
Khali
l
HYBRID OLAP (HOLAP)
HOLAP tools deliver selected data directly from DBMS or via MOLAP server
to the desktop (or local server) in the form of data cube, where it is stored,
analyzed, and maintained locally is the fastest-growing type of OLAP tools.
The issues associated with HOLAP tools:
The architecture results in significant data redundancy and may cause
problems for networks that support many users.
Ability of each user to build a custom data cube may cause a lack of data
consistency among users.
Only a limited amount of data can be efficiently maintained.
46
HOLAP tools provide limited analysis capability, either directly against

RDBMS products, or by using an intermediate MOLAP server.
OLA
P, by
Dr.
Khali
l
23
DESKTOP OLAP (DOLAP)
47
DOLAP tools store the OLAP data in

client-based files and support multidimensional processing using a client
multi-dimensional engine. DOLAP
requires that relatively small extracts
of data are held on client machines.
This data may be distributed in
advance or on demand (possibly
through the Web).
The administration of a DOLAP
database is typically performed by a
central server or processing routine
that prepares data cubes or sets of
data for each user.
The development issues associated
with DOLAP are as follows:
Provision of appropriate security
controls to support all parts of the
DOLAP environment.
Reduction in the effort involved in
deploying and maintaining the
DOLAP tools.
Current trends are towards thin
client machines.
OLA
P, by
Dr.
Khali
l
Slicing a data cube
48
24
Summary report
Drill-down
49
Drill-down with
color added
Starting with
summary data,
users can obtain
details for
particular cells
WAREHOUSE MODELS & OPERATORS
Data Models
relations
stars & snowflakes
cubes
Operators
slice & dice

roll-up, drill down
pivoting
other
25
CONCEPTUAL MODELING OF DATA

WAREHOUSES
Modeling data warehouses: dimensions & measures

Star schema: A fact table in the middle connected to
51
a set of dimension tables
Snowflake schema: A refinement of star schema

where some dimensional hierarchy is normalized into
a set of smaller dimension tables, forming a shape
similar to snowflake
Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,

therefore called galaxy schema or fact constellation
EXAMPLE OF STAR SCHEMA

time
item
Sales Fact Table
time_key
item_key
item_key
item_name
brand
type
supplier_type
52
time_key
day
day_of_the_week
month
quarter
year
branch_key
location
branch
location_key
branch_key
branch_name
branch_type
units_sold
dollars_sold
avg_sales
location_key
street
city
province_or_street
country
Measure
s
26
EXAMPLE OF SNOWFLAKE SCHEMA

time
item
item_key
item_name
brand
type
supplier_key
Sales Fact Table

time_key
item_key
supplier
53
time_key
day
day_of_the_week
month
quarter
year
supplier_key
supplier_type
branch_key
location
branch
location_key
branch_key
branch_name
branch_type
units_sold
location_key
street
city_key
dollars_sold
city
city_key
city
province_or_street
country
avg_sales
Measure
s
EXAMPLE OF FACT CONSTELLATION

time
item
Sales Fact Table
time_key
item_key
item_name
brand
type
supplier_type
item_key
branch_key
location_key
branch
branch_key
branch_name
branch_type
units_sold
dollars_sold
avg_sales
Measures
Shipping Fact Table

time_key
54
time_key
day
day_of_the_week
month
quarter
year
item_key
shipper_key
from_location
location
location_key
street
city
province_or_street
country
to_location
dollars_cost
units_shipped
shipper
shipper_key
shipper_name
location_key
shipper_type
27
STAR
product
prodId
p1
p2
name price
bolt
10
nut
5
sale oderId date

o100 1/7/97
o102 2/7/97
105 3/8/97
customer
custId
53
81
111
custId
53
53
111
prodId
p1
p2
p1
name
joe
fred
sally
storeId
c1
c1
c3
address
10 main
12 main
80 willow
store
storeId
c1
c2
c3
qty
1
2
5
amt
12
11
50
city
nyc
sfo
la
city
sfo
sfo
la
STAR SCHEMA
product
prodId
name
price
sale
orderId
date
custId
prodId
storeId
qty
amt
customer
custId
name
address
city
store
storeId
city
28
STAR SCHEMA EXAMPLE

TIME
PRODUCT
time_key
day
day_of_the_week
month
quarter
year
SALES
time_key
product_key
location_key
measures
units_sold
amount
product_key
product_name
category
brand
color
supplier_name
LOCATION
location_key
store
street_address
city
state
country
region
ADVANTAGES OF STAR SCHEMA
Facts and dimensions are clearly depicted
dimension tables are relatively static, data is loaded

(append mostly) into fact table(s)
easy to comprehend (and write queries)
Find total sales per product-category in our stores in Europe
SELECT PRODUCT.category, SUM(SALES.amount)
FROM SALES, PRODUCT,LOCATION
WHERE SALES.product_key = PRODUCT.product_key
AND
SALES.location_key = LOCATION.location_key
AND
LOCATION.region=Europe
GROUP BY PRODUCT.category
29
STAR SCHEMA QUERY PROCESSING

TIME
PRODUCT
time_key
day
day_of_the_week
month
quarter
year
SALES
Pcategory
time_key
product_key
location_key
measures
product_key
product_name
category
brand
color
supplier_name
LOCATION
units_sold
amount
Sregion=Europe
location_key
store
street_address
city
state
country
region
JOIN-INDEX
Join index relates the values of

the dimensions of a star
schema to rows in the fact
table.
a join index on region
maintains for each distinct
region a list of ROW-IDs of
the tuples recording the
sales in the region
Join indices can span multiple
dimensions OR
can be implemented as bitmapindexes (per dimension)
use bit-op for multiple-joins
LOCATION
region = Africa
region = America
region = Asia
region = Europe
SALES
R102 1
R117 1
R118 1
R124 1
30
DATA CUBE: MULTIDIMENSIONAL VIEW

Quarter
1Qtr
2Qtr
3Qtr
4Qtr
sum
America
Europe
Asia
Region
DVD
PC
VCR
sum
Total annual sales

of DVDs in America
sum
DATA CUBE COMPUTATION
Model dependencies among the aggregates:

most detailed view
product,store,quarter
product,quarter
store,quarter
product, store
quarter
product
store
can be computed from view

(product,store,quarter) by
summing-up all quarterly sales
none
31
THE MOLAP CUBE
Fact table view:

sale
prodId
p1
p2
p1
p2
Multi-dimensional cube:
storeId
s1
s1
s3
s2
amt
12
11
50
8
p1
p2
s1
12
11
s2
s3
50
dimensions = 2
3-D CUBE
Fact table view:
sale
prodId
p1
p2
p1
p2
p1
p1
storeId
s1
s1
s3
s2
s1
s2
Multi-dimensional cube:
date
1
1
1
1
2
2
amt
12
11
50
8
44
4
day 2
day 1
s1
s2
s3
p1
44
4
p2 s1
s2
s3
p1
12
50
p2
11
8
dimensions = 3
32
EXAMPLE
roll-up to region
Dimensions:
Time, Product, Store
Attributes:
Product (upc, price, )
Store
Hierarchies:
Product Brand
Day Week Quarter
Store Region Country
NY
SF
roll-up to brand
LA
Product
Juice
Milk
Coke
Cream
Soap
Bread
10
34
56
32
12
56
roll-up to week
M T W Th F S S
Time
56 units of bread sold in LA on M
CUBE AGGREGATION: ROLL-UP

day 2
day 1
p1
p2 s1
p1
12
p2
11
s1
44
s2
4
s2
Example: computing sums

...
s3
s3
50
sum
p1
p2
s1
56
11
s2
4
8
rollup
drill-down
s1
67
s2
12
s3
50
s3
50
129
p1
p2
sum
110
19
33
CUBE OPERATORS FOR ROLL-UP
day 2
day 1
s1
s2
s3
p1
44
4
p2 s1
s2
s3
p1
12
50
p2
11
8
...
sale(s1,*,*)
sum
s1
56
11
p1
p2
s2
4
8
s1
67
s2
12
s3
50
s3
50
129
p1
p2
sale(s2,p2,*)
sum
110
19
sale(*,*,*)
EXTENDED CUBE
*
day 2
day 1
p1
p2
*
p1
p2
s1
*
12
11
23
p1
p2
*
s1
s1
56
11
67
s2
44
s2
44
s3
4
50
8
8
50
s2
4
8
12
s3
*
62
19
81
s3
50
*50
48
48
*
110
19
129
sale(*,p2,*)
34
AGGREGATION USING HIERARCHIES
day 2
day 1
s1
s2
s3
p1
44
4
p2 s1
s2
s3
p1
12
50
p2
11
8
store
region
country
p1
p2
region A region B
56
54
11
8
(store s1 in Region A;
stores s2, s3 in Region B)
POINTS TO BE NOTICED ABOUT MOLAP

Pre-calculating or pre-consolidating transactional data
improves speed.
BUT
Fully pre-consolidating incoming data, MDDs require an
enormous amount of overhead both in processing time and
in storage. An input file of 200MB can easily expand to 5GB
MDDs are great candidates for the <50GB department data

marts.
Rolling up and Drilling down through aggregate data.

With MDDs, application design is essentially the definition
of dimensions and calculation rules, while the RDBMS
requires that the database schema be a star or snowflake.
35
HYBRID OLAP (HOLAP)

HOLAP
= Hybrid OLAP:
Best of both worlds
Storing detailed data in RDBMS
Storing aggregated data in MDBMS
User access via MOLAP tools
DATA FLOW IN HOLAP

RDBMS Server
MDBMS Server
Multidimensional
access
SQL-Read
User
data
Multidimensiona
ldata
Meta data
Derived
data
SQLReach
Through
Client
Multidimensional
Viewer
Relational
Viewer
SQL-Read
36

Olap &amp; Oltp - Updated

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Olap &amp; Oltp - Updated

Hochgeladen von

Copyright:

Verfügbare Formate

DATA, DATABASE, DATA

WAREHOUSE - OLTP & OLAP

1960 - the first database

a record of an enterprise's past transactional and operational

designed to favor efficient data analysis and reporting

data warehousing is not meant for current "live" data

THE RELATIONAL DATABASE MODEL

There are many types of databases

DATABASE COLLECTION OF INFORMATION

In databases, the row number is irrelevant

EXAMPLE: DATABASES WITH LOGICAL TIES WITHIN

DATABASE MANAGEMENT SYSTEM TOOLS

Database management system (DBMS) helps to

the logical organization of a database

DBMS engine accepts logical requests from the various other

Data definition subsystem helps you create and maintain the

DBMS engine separates the logical from the physical

Physical view how information is physically arranged, stored,

You must create a data dictionary before entering information into a

Data manipulation subsystem helps you add, change, and

This is your primary DBMS interface as you work with a database

View allows you to see the contents of a database file

Report generator helps you quickly define formats of

Query-by-example (QBE) tool helps you graphically design

WHY SEPARATE DATA WAREHOUSE?

High performance for both systems

DBMS tuned for OLTP: access methods,

Warehousetuned for OLAP: complex OLAP queries,

Different functions and different data:

missing data: Decision support requires historical data

data consolidation: DS requires consolidation (aggregation,

data quality: different sources typically use inconsistent data

W.H.Immon - Father of the data warehouse

Data Warehousing: process of constructing and using a data

variant, non-updatable collection of data used in support of management

Organized around major subjects, such as customer, product, sales

Data-mining tools software tools used to query information in a data

COMPONENTS OF A DATA WAREHOUSE

Sources - Data Source Interaction

NEED FOR DATA WAREHOUSING

WHAT IS A DATA WAREHOUSE?

WHAT ARE DATA-MINING TOOLS?

Help you find hidden patterns in information

Multidimensional analysis (MDA) tools

Bring new layers to the front

Reorganize rows and columns

Regression, Analysis of variance

DATA WAREHOUSE-ADVANTAGES & DISADVANTAGES

DATA WAREHOUSE ARCHITECTURES

Generic Two-Level Architecture

Generic two-level data warehousing architecture

Periodic extraction data is not completely current

Independent data mart data warehousing architecture

Data access complexity due

Dependent data mart with operational data store: a three-level

Simpler data access

Single ETL for

Dependent data marts

Logical data mart and real time warehouse architecture

Data marts are NOT separate

DATA MINING AND VISUALIZATION

Explain observed events or conditions

Data visualization representing data

Data warehouses can support all of an organizations

Olap & Oltp - Updated

Olap & Oltp - Updated