Beruflich Dokumente
Kultur Dokumente
Intelligence
Data Warehousing
Zhangxi Lin
Texas Tech University
1
1
Outlines
So
Data Warehousing
Definitions and Concepts
Data
warehouse
Data mart
Definition
Data Mart
- The IMW Case
IMW, standing for Internet Media Works!, is
an ASP in real estate information services.
It is headquartered in Austin, Texas. CEO
is Gary Anderson.
Web page: http://www.inetworks.com
About IMW
Based
IMWs Services
Public User
Application
Services
Website Hosting
Services
Optional Website
Hosting Services
Core Membership
Database Services
Optional Membership
Database Services
tables
Measures
Dimension
tables
Snowflakes
Measures
A
11
11
M:1
UserID
Property ID
PropID
Listor ID
M:M
Property Type
Membership Database
M:1
Listor ID
Listor Name
Property Type
Type Name
Address
Company ID
Subtype 1
City
Subtype 2
Chapter
UpdateDate
Feature
Subtype n
Legends
Primary Key
M:M
Functions
Specializations
Company ID
Comp Name
Address
Secondary Key
Telephone #
Link to a table
12
Applications
Geographic distribution of property listings
Scorecard for main performance indicators
Dashboard
Questions
How to model data warehouse?
What are required in data transformation and
preprocessing?
Any missing dimension for data ware housing?
How to perform routine data warehouse updates
frequency, timing, etc.
Membership Dimension
Property ID
Listor ID
Listor ID
Listor Name
PropType
PropType
SubName
Address
Company ID
City
Chapter
UpdateDate
Legends
Primary Key
Features
Specializations
Year
Company ID
Quarter
Comp Name
Month
Date
Secondary Key
Functions
Company
Dimension
Address
Telephone #
Link to a table
14
Data Warehouse
Overview
15
Data Warehousing
Characteristics
Basic
Subject oriented
Integrated
Time variant (time series)
Nonvolatile (not allow to change)
Others
Web based
Relational/multidimensional
Client/server
Real-time
Include metadata
16
Data Warehousing
Process Overview
Data
Data Warehousing
More Concepts
Operational
Data Warehousing
Process Overview
19
Data Warehousing
Process Overview
The
Data sources
Data extraction
Data loading
Comprehensive database
Metadata
Middleware tools
20
Data Warehouse
Architectures
21
Three-Tier Data
Warehouse
23
24
25
26
27
28
Architectures Comparison
29
Teradatas EDW
30
Apache Hadoop
The
32
MapReduce
MapReduce is a framework for processing parallelizable
problems across huge datasets using a large number of
computers (nodes), collectively referred to as a cluster
or a grid.
33
34
35
36
37
2013-12-02
38
39
2013-12-02
40
2013-12-02
41
2013-12-02
42
43
Data Integration
44
Data Integration
Integration
data access,
data federation, and
change capture.
When
45
Data Integration
46
Transformation Tools: To
purchase or to Build in-House
Issues affect whether an organization will purchase data
transformation tools or build the transformation process
itself
Data transformation tools are expensive
Data transformation tools may have a long learning curve
It is difficult to measure how the IT organization is doing
until it has learned to use the data transformation tools
Important criteria in selecting an ETL tool
Ability to read from and write to an unlimited number of
data source architectures
Automatic capturing and delivery of metadata
A history of conforming to open standards
An easy-to-use interface for the developer and the
functional user
47
VM VirtualBox
Cloudera Hadoop - Get Started
With Enterprise Hadoop
Hortonworks Data Platform Hortonworks.com
Google Hadoop Solutions google.com
Hadoop on Google Cloud Platform
Hadoop & NoSQL - MarkLogic.com
48
SSIS
SSIS
SSAS
SSAS
BIDS
SSRS
SSRS
SAS
SAS
EG
EG
SAS
SAS
EM
EM
49
Learning Objectives
To gain a general impression how to use SQL Server 2008 to
implement a data mart
Tasks
Create your database with SSMS, named as
ISQS6339_lastname
Import data from Commrex_2011.xls
Use SSMS to create a ERD diagram
Create a SSAS project using BIDS
Define data source, data source view, and cube
Deliverable:
One-page printout of the screenshot of the cube diagram
50