Sie sind auf Seite 1von 24

CPIT 440

Data Mining and Warehouse

Lab3
www.company.com

CPIT 440
Data Mining and Warehouse

Lab3: Outlines
Introduction to Data Warehouse
What is Data Warehouse ?
Difference between Data Warehouse and Database

Introduction to OLAP operations


Introduction to cubes
Cube structure
OLAP Operations

Exercises

www.company.com

CPIT 440
Data Mining and Warehouse

Data Warehouse
What is Data Warehouse ?
A data warehouse is a repository of an organization's
stored data that is designed for query and analysis
rather than for transaction processing to facilitate
reporting and analysis.
It usually contains historical data derived from
transaction data, but it can include data from other
sources.
It separates analysis workload from transaction
workload and enables an organization to consolidate
data from several sources.

www.company.com

CPIT 440
Data Mining and Warehouse

Data Warehouse

www.company.com

CPIT 440
Data Mining and Warehouse

Difference between Data


Warehouse and Database
A question we often asks out in the field is:
I already have a database, so why do I need a
data warehouse ? What is the difference
between a database vs. a data warehouse?
Database

Data Warehouse

Designed to handle
transactions

It is structured to make
analytics fast and easy.

It isnt designed to handle and


do analytics well.

It exists as a layer on top of


another database or
databases, and takes the data
from all these databases and
creates a layer optimized for
and dedicated to analytics.
www.company.com

CPIT 440
Data Mining and Warehouse

Introduction to OLAP Operations


Introduction to cubes:
A cube is a set of data that is usually constructed from
a subset of a data warehouse and is organized and
summarized into a multidimensional structure defined
by a set of dimensions and measures.
Cubes are the main objects in online analytic
processing (OLAP),
It is a technology that provides fast access to data in a
data warehouse.

www.company.com

CPIT 440
Data Mining and Warehouse

Introduction to OLAP Operations

www.company.com

CPIT 440
Data Mining and Warehouse

Introduction to OLAP Operations


Cube Structure:
Every cube has a schema, which is the set of joined
tables in the data warehouse from which the cube
draws its source data.
The central table in the schema is the fact table, the
source of the cube's measures.
The other tables are dimension tables, the sources of
the cube's dimensions.

www.company.com

CPIT 440
Data Mining and Warehouse

Introduction to OLAP Operations


Cube Structure
A cube's structure is defined by its measures and
dimensions.
They are derived from tables in the cube's data source.
The set of tables from which a cube's measures and
dimensions are derived is called the cube's schema.
Every cube schema consists of a single fact table and one
or more dimension tables.
The cube's measures are derived from columns in the fact
table.
The cube's dimensions are derived from columns in the
dimension tables.
www.company.com

CPIT 440
Data Mining and Warehouse

Introduction to OLAP Operations


Cube Structure
Star schema: A fact table in the middle connected to a
set of dimension tables
Snowflake schema: A refinement of star schema where
some dimensional hierarchy is normalized into a set of
smaller dimension tables, forming a shape similar to
snowflake
Fact constellations: Multiple fact tables share dimension
tables, viewed as a collection of stars, therefore called
galaxy schema or fact constellation
www.company.com

CPIT 440
Data Mining and Warehouse

Introduction to OLAP Operations


OLAP Operations:
Roll up: summarize data / dimension reduction
Roll down: reverse of roll-up
Make detailed data, or introducing new dimensions

Slice and dice


Pivot (rotate)

www.company.com

CPIT 440
Data Mining and Warehouse

Roll up and Roll down

www.company.com

CPIT 440
Data Mining and Warehouse

Slice and Dice

www.company.com

CPIT 440
Data Mining and Warehouse

Pivot (Rotate)

www.company.com

CPIT 440
Data Mining and Warehouse

Exercise 1
Suppose that a data warehouse consists of the
three dimensions: time, doctor, and patient, and
the two measures count and charge, where
charge is the fee that a doctor charges a patient
for a visit.

www.company.com

CPIT 440
Data Mining and Warehouse

Exercise 1
(a) Enumerate three classes of schemas that are
popularly used for modeling data warehouses.
Three classes of schemas popularly used for modeling data
warehouses are
The star schema,
The snowflake schema
The fact constellations schema.

www.company.com

CPIT 440
Data Mining and Warehouse

Exercise 1
(b) Draw a schema diagram for the above data
warehouse using one of the schema classes
listed in part (a).

www.company.com

CPIT 440
Data Mining and Warehouse

Exercise 1
(c) Starting with the base cuboid [day; doctor;
patient], what specific OLAP operations should
be performed in order to list the total fee
collected by each doctor in 2004?
The operations to be performed are:
Roll-up on time from day to year.
Slice for time=2004.
Roll-up on patient from individual patient to all.

www.company.com

CPIT 440
Data Mining and Warehouse

Exercise 2
Suppose that a data warehouse for Big
University consists of the following four
dimensions: student, course, semester, and
instructor, and two measures count and avg.
grade.
When at the lowest conceptual level (e.g.,for a
given student, course, semester, and instructor
combination), the avg. grade measure stores the
actual course grade of the student.
At higher conceptual levels, avg. grade stores
the average grade for the given combination.
www.company.com

CPIT 440
Data Mining and Warehouse

Exercise 2
(a) Draw a snowflake schema diagram for the data
warehouse.

www.company.com

CPIT 440
Data Mining and Warehouse

Exercise 2
(b) Starting with the base cuboid [student; course;
semester; instructor], what specific OLAP
operations should perform in order to list the
average grade of CS courses for each Big
University student.
The specific OLAP operations to be performed are:
Roll-up on course from course id to department.
Roll-up on student from student id to university.
Dice on course, student with department=\CS" and
university = \Big University".
Drill-down on student from university to student
name.
www.company.com

CPIT 440
Data Mining and Warehouse

Exercise 3
Suppose that a data warehouse consists of the
four dimensions; date, spectator, location, and
game, and the two measures, count and charge,
where charge is the fee that a spectator pays
when watching a game on a given date.
Spectators may be students, adults, or seniors,
with each category having its own charge rate.

www.company.com

CPIT 440
Data Mining and Warehouse

Exercise 3
(a) Draw a star schema diagram for the data
warehouse.

www.company.com

CPIT 440
Data Mining and Warehouse

Exercise 3
(b) Starting with the base cuboid [date; spectator;
location; game], what specific OLAP operations
should perform in order to list the total charge
paid by student spectators at GM Place in 2004?
The specific OLAP operations to be performed are:
Roll-up on date from date id to year.
Roll-up on spectator from spectator id to status.
Roll-up on location from location id to location name.
Roll-up on game from game id to all.
Dice with status=\students", location name=\GM
Place", and year=2004.
www.company.com

Das könnte Ihnen auch gefallen