Beruflich Dokumente
Kultur Dokumente
HISTORY
Database
Data Warehouse
It became a distinct type of
computer database during the
late 1980s and early 1990s
INTRODUCTION
Database
collection of related data
database management system (DBMS) is a collection of
programs that enables users to create and maintain a
database
used in many applications
Data warehouse
CONTD...
Database
a structured collection of records or data
Data Warehouse
a logical collection of information, gathered from many different
operational databases, that supports business analysis activities and
decision-making tasks
Database models
is the structure or format of a database, described in a formal language
supported by the database management system
3-6
DATABASES CONTD
5 software components:
1.
2.
3.
4.
5.
DBMS engine
Data definition subsystem
Data manipulation subsystem
Application generation subsystem
Data administration subsystem
CONTD
DBMS
With a database, you only concern yourself with your logical view
CONTD
CONTD
You can save report formats and generate reports at any time with
up-to-date information
CONTD
CONTD
Structured query language (SQL) standardized fourthgeneration language found in most DBMSs
Performs the same task as a QBE tool
But uses a sentence structure instead of point-and-click
interface
SQL is used mostly by IT people
Application generation subsystem contains facilities to
help you develop transaction-intensive applications
Data entry screen (called forms)
Programming languages
Used mostly by IT specialists
indexing,
DATA WAREHOUSE
warehouse
Subject-Oriented
Integrated
Constructed by integrating multiple, heterogeneous data sources
Data cleaning and data integration techniques are applied
Ensure consistency in naming conventions, encoding structures,
attribute measures, etc. among different data sources
E.g., Hotel price: currency, tax, breakfast covered, etc.
When data is moved to the warehouse, it is converted
CONTD
Time Variant
Every piece of data contained within the warehouse must be associated
with a particular point in time if any useful analysis is to be conducted
with it.
Another aspect of time variance in DW data is that, once recorded, data
within the warehouse cannot be updated or changed.
Non-volatality
Typical activities such as deletes, inserts, and changes that are
performed in an operational application environment are completely
nonexistent in a DW environment.
Only two data operations are ever performed in the DW: data loading
and data access.
Collection of tools - gathering data; cleansing, integrating, querying, reporting,
analysis, data mining, monitoring, administering warehouse
Query-and-reporting tools
Intelligence agents
Multidimensional analysis tools
Statistical tools
10
11
CONTD
Query-and-reporting tools
similar to QBE tools, SQL, and report generators in the typical
database environment
Intelligent Agents
Use various artificial intelligence tools such as neural networks
and fuzzy logic to form the basis for information discovery and
building business intelligence
view
Statistical Tools
Help you apply various mathematical models to the information
stored in a data warehouse to discover new information
12
Disadvantages
Adding new data sources takes time and associated high cost
Data owners lose control over their data, raising ownership, security
and privacy issues
Long initial implementation time and associated high cost
Difficult to accommodate changes in data types and ranges, data source
schema, indexes and queries
4.
5.
13
One,
companywide
warehous
e
27
T
E
Separate ETL for each independent
data mart
14
T
E
29
T
E
Near real-time ETL for
Data Warehouse
15
Techniques
Case-based reasoning
Rule discovery
Signal processing
Neural nets
Fractals
in
DATA MARTS
16
BUSINESS INTELLIGENCE
OLTP
OLAP
17
EXAMPLE
18
Time Intelligence.
19
OLAP BENEFITS
OLA
P, by
Dr.
Khali
l
20
CONTD
The cube can be expanded to include another dimension, for example, the
number of sales staff in each city.
The response time of a multi-dimensional query depends on how many
cells have to be added on-the-fly.
41
OLA
P, by
Dr.
Khali
l
CONTD
21
MOLAP data structures use array technology and efficient storage techniques
that minimize the disk space requirements through sparse data management.
22
45
To improve performance, some ROLAP products have enhanced SQL engines to support the
complexity of multi-dimensional analysis, while others recommend, or require, the use of
highly denormalized database designs such as the star schema.
OLA
P, by
Dr.
Khali
l
HOLAP tools deliver selected data directly from DBMS or via MOLAP server
to the desktop (or local server) in the form of data cube, where it is stored,
analyzed, and maintained locally is the fastest-growing type of OLAP tools.
The issues associated with HOLAP tools:
The architecture results in significant data redundancy and may cause
problems for networks that support many users.
Ability of each user to build a custom data cube may cause a lack of data
consistency among users.
Only a limited amount of data can be efficiently maintained.
46
OLA
P, by
Dr.
Khali
l
23
47
OLA
P, by
Dr.
Khali
l
48
24
Summary report
Drill-down
49
Drill-down with
color added
Starting with
summary data,
users can obtain
details for
particular cells
Data Models
relations
stars & snowflakes
cubes
Operators
25
51
item
Sales Fact Table
time_key
item_key
item_key
item_name
brand
type
supplier_type
52
time_key
day
day_of_the_week
month
quarter
year
branch_key
location
branch
location_key
branch_key
branch_name
branch_type
units_sold
dollars_sold
avg_sales
location_key
street
city
province_or_street
country
Measure
s
26
item
item_key
item_name
brand
type
supplier_key
supplier
53
time_key
day
day_of_the_week
month
quarter
year
supplier_key
supplier_type
branch_key
location
branch
location_key
branch_key
branch_name
branch_type
units_sold
location_key
street
city_key
dollars_sold
city
city_key
city
province_or_street
country
avg_sales
Measure
s
item
Sales Fact Table
time_key
item_key
item_name
brand
type
supplier_type
item_key
branch_key
location_key
branch
branch_key
branch_name
branch_type
units_sold
dollars_sold
avg_sales
Measures
54
time_key
day
day_of_the_week
month
quarter
year
item_key
shipper_key
from_location
location
location_key
street
city
province_or_street
country
to_location
dollars_cost
units_shipped
shipper
shipper_key
shipper_name
location_key
shipper_type
27
STAR
product
prodId
p1
p2
name price
bolt
10
nut
5
customer
custId
53
81
111
custId
53
53
111
prodId
p1
p2
p1
name
joe
fred
sally
storeId
c1
c1
c3
address
10 main
12 main
80 willow
store
storeId
c1
c2
c3
qty
1
2
5
amt
12
11
50
city
nyc
sfo
la
city
sfo
sfo
la
STAR SCHEMA
product
prodId
name
price
sale
orderId
date
custId
prodId
storeId
qty
amt
customer
custId
name
address
city
store
storeId
city
28
PRODUCT
time_key
day
day_of_the_week
month
quarter
year
SALES
time_key
product_key
location_key
measures
units_sold
amount
product_key
product_name
category
brand
color
supplier_name
LOCATION
location_key
store
street_address
city
state
country
region
29
PRODUCT
time_key
day
day_of_the_week
month
quarter
year
SALES
Pcategory
time_key
product_key
location_key
measures
product_key
product_name
category
brand
color
supplier_name
LOCATION
units_sold
amount
Sregion=Europe
location_key
store
street_address
city
state
country
region
JOIN-INDEX
LOCATION
region = Africa
region = America
region = Asia
region = Europe
SALES
R102 1
R117 1
R118 1
R124 1
30
2Qtr
3Qtr
4Qtr
sum
America
Europe
Asia
Region
DVD
PC
VCR
sum
sum
product,quarter
store,quarter
product, store
quarter
product
store
none
31
prodId
p1
p2
p1
p2
Multi-dimensional cube:
storeId
s1
s1
s3
s2
amt
12
11
50
8
p1
p2
s1
12
11
s2
s3
50
dimensions = 2
3-D CUBE
Fact table view:
sale
prodId
p1
p2
p1
p2
p1
p1
storeId
s1
s1
s3
s2
s1
s2
Multi-dimensional cube:
date
1
1
1
1
2
2
amt
12
11
50
8
44
4
day 2
day 1
s1
s2
s3
p1
44
4
p2 s1
s2
s3
p1
12
50
p2
11
8
dimensions = 3
32
EXAMPLE
roll-up to region
Dimensions:
Time, Product, Store
Attributes:
Product (upc, price, )
Store
Hierarchies:
Product Brand
Day Week Quarter
Store Region Country
NY
SF
roll-up to brand
LA
Product
Juice
Milk
Coke
Cream
Soap
Bread
10
34
56
32
12
56
roll-up to week
M T W Th F S S
Time
56 units of bread sold in LA on M
p1
p2 s1
p1
12
p2
11
s1
44
s2
4
s2
s3
s3
50
sum
p1
p2
s1
56
11
s2
4
8
rollup
drill-down
s1
67
s2
12
s3
50
s3
50
129
p1
p2
sum
110
19
33
day 2
day 1
s1
s2
s3
p1
44
4
p2 s1
s2
s3
p1
12
50
p2
11
8
...
sale(s1,*,*)
sum
s1
56
11
p1
p2
s2
4
8
s1
67
s2
12
s3
50
s3
50
129
p1
p2
sale(s2,p2,*)
sum
110
19
sale(*,*,*)
EXTENDED CUBE
*
day 2
day 1
p1
p2
*
p1
p2
s1
*
12
11
23
p1
p2
*
s1
s1
56
11
67
s2
44
s2
44
s3
4
50
8
8
50
s2
4
8
12
s3
*
62
19
81
s3
50
*50
48
48
*
110
19
129
sale(*,p2,*)
34
day 2
day 1
s1
s2
s3
p1
44
4
p2 s1
s2
s3
p1
12
50
p2
11
8
store
region
country
p1
p2
region A region B
56
54
11
8
(store s1 in Region A;
stores s2, s3 in Region B)
35
= Hybrid OLAP:
MDBMS Server
Multidimensional
access
SQL-Read
User
data
Multidimensiona
ldata
Meta data
Derived
data
SQLReach
Through
Client
Multidimensional
Viewer
Relational
Viewer
SQL-Read
36