Sie sind auf Seite 1von 57

Data Warehousing and Online Analytical Processing (OLAP)

Data is good

Things to think about over the course of the session


Where does OLAP technology fit in the soup of reporting and decision support system technologies used (and not used) in industry? Spreadsheets vs. OLAP or spreadsheets & OLAP Who are the users? Who are the OLAP producers? BI job prospects compared to other IT jobs What is Microsofts strategy wrt OLAP?

BI Consumer Pool
Operations research / management science

Data Warehouse Design Considerations


http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sql_dwdesign.asp

Microsoft Examples
Excel, Access, Reporting Services, Data Analyzer, Office, 3rd party tools

SQL Server, Access SQL Server, Access DTS

Analysis Services (SQL Server

What is a Data Warehouse?


Data warehouses are databases designed for analysis.

Data is:
Subject oriented Integrated Time-Variant Nonvolatile

Data enters DW from operational environment, transaction processing systems (TPS).

1. Subject Orientation
TPS organized around processes, functions
billing, banking, purchasing, payroll, etc.

DW organized around subjects


customers, vendors, encounters, sales

Transactions
TPS processes transactions DW stores summary info related to transactions

TPS - keeps data needed for transaction DW - keeps data needed for analysis

2. Integration
DW must integrate data from different apps Create consistency across applications
naming conventions measurement of variables (units) data types encoding

DSS analyst - use the data, not worry about credibility/consistency of data
often best person to find subtle data problems

3. Time Variancy
TPS accurate at moment of access DW accurate as of some moment in time
Operational - current value data
Time horizon 60-90 days Key may or may not have an element of time Data can be updated

Data warehouse snapshot data


Time horizon 5-10 years Key contains an element of time Once snapshot made, data cannot be updated

4. Nonvolatility
Change Replace

Insert

Insert
Load

Access

Change

Delete

Operational

Data warehouse

Case for Data Warehousing


Can perform querying/reporting on servers/disks not associated with TPS - performance of both affected Can use query/reporting optimization which would not be appropriate for TPS Less technical expertise needed to create queries/reports Facilitates reporting across different TPSs Facilitates storage of historical data Prevents analysts from requiring access to live TPS
Source: Data Warehouse Information Center, http://www.dwinfocenter.org/

What is a Data Mart


Departmental based DSS database Same architectural foundation as DW Data mart is reconcilable with DW Users are more likely business analysts than IT technicians (some overlap likely)
farmers (know what they want) explorers (not sure what they want) more farmers than explorers

Why Data Marts?


DW gets large, competition among users Data becomes harder to customize in DW Analysis tools for dealing with LARGE datasets may not be as elegant as tools designed for smaller datasets Dept can customize data as it flows into data mart - no need to serve entire corporation Dept can choose amount of historical data to include in data mart. Less concern about impact on DW of computationally intensive analyses Dept can choose own data mart DB software Dept can choose specialized analytical software

Multi-dimensional Data Modeling


Designed to facilitate analysis (not transactions) Extremely common in data warehousing
A de-facto standard Concept has been around for a long time

Intuitive concept of many dimensions or perspectives on business measures or facts


view sales facts (such as total sales) from customer, product and time perspective

Multidimensional conceptual view of data for user


Rich dimensional structuring with hierarchical referencing Efficient specification of dimensions and calculations Separation of structure and representation Underlying database technology will vary - relational, proprietary data cubes

Hypercube an n-sided cube


User can slice, dice, rotate, browse through multidimensional cubes of data

Data Models
Relational vs. Multi-dimensional
Transaction focused Focus on many linked, normalized tables Analysis focused Normalized fact table joined to a few highly non-normalized dimension tables Many simple, intuitive data models Lots of redundancy

One big complex data model Very little redundancy

Data (Hyper) cubes


2-d to 3-d cube

Rotating the cube

A 6-D Multidimensional Type Structure Example


Data generating events happen at the intersection of specific instances of each dimension.

Total Sales(Store5, 65+ Males, Feb, Actual, Shirts)

OLAP Solutions by Thomsen

What is OLAP?
Software tool providing multi-dimensional view of data for business analysis Example of Decision Support or Business Intelligence tool Fast data access and fast computations Interactive, flexible user interface Slice, dice, drill-down Excel Pivot Table and Pivot Chart are examples of simple OLAP tools

Defining OLAP - ANALYSIS


Business logic and statistical analysis relevant to end user Should not require programming for everything Analysis can be via vendors tools or link to generic analytical platform such as spreadsheet Examples include time series analysis, cost allocation, currency translation, goal seeking, ad-hoc multidimensional structural changes (cube building), nonprocedural modeling, exception alerting, and data mining. Capabilities vary widely by vendor and market

Common cube operations


Pivot or Rotate change which dimensions and/or levels within dimensions are shown on row and column axes Roll-up aggregate or combine cells within a dimension according to some mathematical operation
Uses a hierarchy definition for the dimension Commonly this is summation or count

Drill down examine data a greater level of detail


Add another row or column header which is further down the concept hierarchy

Slice select a subset of a cube by constraining the value of some dimension


Ex: Select cells for month = January in time dimension

Dice select a subset of a cube by constraining two or more dimensions Drill through access atomic level detail data

Steps in Multi-dimensional Modeling


Food Mart Example

Choose business process (this will be the cube)


Selling food at stores

Choose grain of process (finest level of detail in the cube)


Total sales of a product on a date to a customer at a store with some sales promotion

Choose dimensions
Product, Customer, Store, Promotion, Time

Choose measured facts


Unit sales, total sales $, total cost $

Bringing Facts and Dimensions Together The Star-Schema


simple symmetric few joins

extendable
Dimensions are highly nonnormalized (i.e. lots of redundancy

Atomic data

From 1st ed of Kimballs Data Warehouse Toolkit, notice no transaction # in the Sales Fact table

Lets Build our First Cube


We are going to use the FoodMart 2000.mdb MS Access database as our source Instead of using the tutorial that comes with AS, we are going to use a web based tutorial that has more details about the various steps and more business context Lets start by visiting the one of the best web based resources for MS AS
http://www.mosha.com/msolap/ Follow the General link

Introduction to Analysis Services - by William Pearson (series of articles) http://www.databasejournal.com/article.php/1459531/


This set of Tutorials are an extremely nice complement to the Tutorials that ship with MS AS Lets do the first one together and create our first cube

Setup Database and Data Source


Now well set up the data source
Many OLE DB providers (a standard for communicating with data sources) Well use local FoodMart2000.mdb file that comes with Analysis Services
Lets first download a copy of the FoodMart2000.mdb file Select Microsoft Jet 4.0 OLE DB Provider (Jet is Access)
C:\Program Files\Microsoft Analysis Services\Samples

Need to do a Copy-Paste to get around lack of Rename capability for databases and hideous path based default name Now have an OLAP database and a valid Data Source On to Creating a Cube

Cube Building and Editing


Cube Wizard steps us through creating a cube
Selecting Fact table and measures Selecting/Creating dimensions and hierarchies Designing storage options Processing the cube

Then we can Browse the cube with the Cube Browser in SQL Server and using Excel Cube Editor allows us to make changes to our cube that we could not do (or did incorrectly, or forgot to do) while in the Cube Wizard Similarly, the Dimension Editor allows to make dimension changes

Microsoft SQL Server 2000 Analysis Services will be our bridge today

OLAP Concepts

OLAP Practice

What is SQL Server?


Microsofts industrial strength database management system Intended for client/server architecture Includes numerous tools for creating, modifying and working with SQL Server databases It is NOT a front end development tool (i.e. no Forms, Reports, VB code modules as in Access) Microsoft battling Oracle and IBM (DB2) for dominance in the corporate database market
SQL Server using price, performance, ease of use and Microsoft ubiquity as marketing levers

Comes in various flavors such as Enterprise, Developer and Personal


You can get Personal through the MSDNAA program here at OU Personal edition (which is installed in this lab) has server and client running on same machine

MS Access Two tools in one


(1) Tools for creating Forms, Reports, Macros, Modules (2) Database engine (Jet) .mdb file is container for both data tables as well as front end elements.

MS SQL Server
SQL Server client side tools such as Enterprise Manager, Query Analyzer, and DTS

Windows apps created with VB, C++, C#, Java

Application software such as Excel or Access

Crystal Reports, Business Objects, etc.

Web applications

SQL Server Databases (1) data file (2) transaction log file

Transact-SQL or T-SQL is SQL Servers SQL dialect

Main Components of SQL Server


(1) On the Server Side

SQL Server database engine


runs as a service on computer acting as the database server server listens for requests from clients

Analysis Services
OLAP and data mining (2) Important Client Tools

Icon in tray indicating SQL Server is running

Enterprise Manager primary tool for managing SQL Server databases, security and other objects Query Analyzer a tool for building and analyzing SQL statements Data Transformation Services (DTS) a tool for manipulating data coming into or going out of a SQL Server database Analysis Manager tool for managing OLAP databases

Understanding MS Analysis Services (AS)


Included in SQL Server 2000
Was called OLAP Services in SQL Server 7 Manage cubes with Analysis Manager
Analysis Manager

Includes cube building capabilities


Draws data from the data warehouse and constructs MD cubes for analysis Cube builder is heavily wizardized

Facilitates DSS app dev

Various dbs
Data warehouse

Decision Support Objects (DSO)


Analysis Services

Cubes get data from warehouse and make data available to reports
OLAP Cube

Includes a few data mining models Runs as a service in Windows AS includes various administrative tools as well as a simple data browsing tool
Its NOT meant to be an OLAP front end pivoting tool

AS is getting another huge facelift in SQL Server 2005 and seems to be gaining strategic importance for Microsoft

Introducing Analysis Manager


Tool for managing OLAP cubes and data mining models in MS SQL Server AS Heavily wizard based SQL Server Books Online Snap In application Start Analysis Manager
Its in the SQL Server program group

SQL Server

Sample database that comes with AS


OLAP cubes Learn AS

Dimensions for analysis BY

One E-R vs. Many Stars


Transaction focus
Analysis focus

One E-R model for all the business process.

One star per modeled business process.

Exploring FoodMarts Dimensions


Customer very wide, many unrelated attributes Product snowflaked design with product_class table Employee used to create a parent-child type dimension to represent link between employees and their supervisors (who are also employees) Store member properties used to enhance the store name level of the dimension Time a typical, though somewhat small example. Notice the surrogate key.

1. Select Business Process to Model


Some natural business activity supported by one or more information systems Examples: purchasing, orders, shipments, inventory, admissions, service calls, installations, bookings, production We represent each activity as a table of facts with many related dimension tables Designing and creating the fact and dimension tables is the critical part of creating a useful multidimensional data model the cube
This is a combination of business analysis and technical design Lets look at the pre-built cubes in FoodMart 2000

2. The Fact Table


Fact table contains
numeric measures describing some aspect of the business process Foreign keys into dimension tables Degenerate dimensions such as transaction numbers, invoice numbers, etc.

Each row is a measurement Each row is at the same grain Many useful facts are numeric and additive
Cubes tend to be quite sum and count centric Well encounter additive, semi-additive and non-additive facts

Fact tables are usually deep (many rows) and narrow (not so many columns) Three common fact table types
Transaction facts [EX: sales] Periodic snapshot facts [EX: daily inventory facts] Accumulating snapshot facts [EX: status of something as it moves through relatively well defined phases or stages such as a product as it moves from order to manufacture to shipment]

2. Declare Grain of the Process


Specify exactly what a fact table row represents
An individual line item on a customers retail sales ticket as measured by a scanner device A line item on a hospital bill A single hotel reservation Daily inventory snapshot for each product in a warehouse Monthly snapshot of a 401k account

This is the lowest level of detail you are choosing to store in the multidimensional data model Table is sparse in sense that rows representing the fact that nothing happened (i.e. values of 0) are usually NOT stored EXAMPLE: Lets check the grain and measures of the fact tables in FoodMart 2000.
Sales Facts Inventory Facts

4. Choose dimensions that apply to each fact table row


How do people describe the data that results from the fact table rows? Dimension richness is what really makes the data warehouse
Lots of them filled with descriptive business information

The business perspectives BY which people describe their data


Time, customers, products, locations, suppliers, prospect type, problem, scenarios Source of query constraints, groupings, report labels

Dimension tables are wide (many fields), denormalized (plenty of redundancy), with usually not too many rows Use surrogate keys (meaningless long integers) instead of natural keys (actual meaningful primary keys from source systems such as customer #s, invoice numbers, etc.)
Robust to changes in source systems Can handle source system key reuse Source keys may be complex strings (slow) with embedded info (hard to use)

So, is something a Measured fact or Dimension attribute?


Is it a measurement that takes on lots of values participates in calculations? Or is it a relatively constant value that participates in constraints?

Dimension Hierarchies
A conceptual relationship between related items in a dimension Region Year
State City Address

Product category Product subcategory Brand name

Quarter Month Date

Manager Supervisor Analyst

Highest level in hierarchy usually has least number of distinct values Hierarchies facilitate roll up of lower levels into higher levels as well as drill down from higher levels to lower levels A dimension can have multiple hierarchies
MS AS restricts to one hierarchy per dimension Other dimension attributes can be designated as Member Properties MS AS 2005 will support multiple hierarchies within a dimension

Time Dimension
Almost all MD models have a time or date dimension
Can build this in advance independent of the application Include many, many attributes that describe dates/times such as various time periods in calendar time as well as fiscal time, holiday indicators, weekday names, etc. Lets explore the smallish time dimension in FoodMart and a slightly richer one in grocer.mdb

Some tools (including AS) will let you create a time dimension from any date/time field
Probably better to create your own fully seeded date table for the time dimension Ensures no time gaps due to no facts

Kimball recommends creating separate date table and time table


Far less space required

Designing Cubes - Facts


Often cubes based on underlying data modeled as a star-schema (or one of its variants) Identifying fact table and associated measures Fact table also contains keys that join to the dimension tables Dimensions often shared across multiple cubes AS currently restricts each cube to a single fact table
Can work around using virtual cubes

Designing Cubes - Dimensions


We are picking a set of dimension tables as well as the fields within each table making up the levels of each dimension. Dimension design involves picking dimension table and fields representing hierarchical levels within the dimension

Precalculated Summaries or Aggregations


Most OLAP cube creation software allows the creation of precalculated summaries or aggregations Idea: Some summary queries will be asked for over and over and over, so lets precalculate them Tradeoff: extra space to store aggregate values vs. query response speedup due to precalculation

Process the Cube


A multi-phase procedure

Filter or Page region

Analysis Manager is NOT a full featured cube browsing and analysis tool Has basic browsing ability to aid in cube creation/modification

Reporting Options for Analysis Services Cubes: MS Excel


Excel can link to OLAP cubes
Can build Excel Pivot Tables based on cubes Need MS Query to connect to a cube

Differences between pivoting on data in Excel vs. data in OLAP cube


Row limits Hierarchies Some limitations in pivot table functionality with OLAP cubes (e.g. no calculated fields or items)

Can create a local cube for offline pivoting


From Pivot Table toolbar | Offline OLAP and then youre prompted to save a .cub file

Microsoft Analysis Services

A Few Products

Part of SQL Server 2000 Create OLAP cubes, some data mining

Hyperion Essbase
Full suite of business intelligence developer and end user tools

Business Objects (Crystal)


Full suite of business intelligence developer and end user tools

Microstrategy Oracle Information Builders


Home of WebFocus, a web based OLAP tool

Pentaho
A new open source business intelligence project http://www.pentaho.org/

Lets OLAP
Download the following from the Downloads section of course web:
CallCenter-DataWarehouse.mdb CallCenterPivot.xls

Lets look at Excel Pivot Tutorial Can even publish Pivot Tables to Web

A Call Center Example


Tech Support for MS Office Technology enabled business processes Massive amount of data captured by ACD Some data analysis done by ACD Difficult operational questions related to staffing/scheduling impact on service level Many call centers in many industries

Steps in Multi-dimensional Modeling


Call Center Example

Choose business process


Servicing technical support calls

Choose grain of process


Individual phone calls

Choose dimensions
Customer, application, problem, time

Choose measured facts


time on hold, service time of call

The Star Schema


A multi-dimensional data model
Customer dimension

Time dimension

Non-normalized

Fact table
Non-normalized

Normalized

Non-normalized

Non-normalized

Application dimension

Problem dimension

Reference Library

BI Resources
The Data Warehousing Institute http://www.tdwi.org/ Kimball and Associates http://www.ralphkimball.com./html/articles.html A Dimensional Modeling Manifesto Kimball, R. http://www.dbmsmag.com/9708d15.html DSS Resources http://dssresources.com/ Data Warehousing Information Center http://www.dwinfocenter.org/ Intelligent Enterprise http://www.intelligententerprise.com/ DM Review http://dmreview.com/ KDNuggets http://www.kdnuggets.com/ IT Toolbox http://www.ittoolbox.com/ http://businessintelligence.ittoolbox.com/ http://datawarehouse.ittoolbox.com/ OLAP Report http://www.olapreport.com/
Some free stuff (nice history of OLAP and commentary on industry trends Other stuff costs $

http://www.mosha.com/msolap/
Awesome set of resources from the lead developer on MS SQL Server Analysis Server team

Some Good Books and Articles


The Data Warehouse Toolkit Kimball, R.
Definitive

OLAP Solutions Thomsen, E.


Definitive, abstract and dense, good

Step by Step: Microsoft SQL Server 2000 Analysis Services Jacobson, R.


Tutorial driven, nice way to learn about AS

Data Warehouse Design Solutions Adamson and Venerable


Multi-D DW designs from lots of different industries Very practical, uses realistic situations to reinforce the concepts

Summers Rubber Company designs its data warehouse


Gorla, Narasimhaiah; Krehbiel, Steve Interfaces; Mar/Apr 1999; 29, 2; ABI/INFORM Global

More AS Tutorials and Resources


http://www.mosha.com/msolap/
This is the granddaddy of MS SQL Server Analysis Services resoures. Mosha Pasumansky is the MS development lead on AS engine. Site is gold mine of information and links regarding AS and related software He participates in microsoft.public.sqlserver.olap Great Blog at http://www.sqljunkies.com/WebLog/mosha/

Introduction to Analysis Services - by William Pearson (series of articles) http://www.databasejournal.com/article.php/1459531/


Very nice series of MS AS tutorials

Best practices for Business Intelligence using the Microsoft Data Warehousing Framework
A white paper from Microsoft

MS AS 2005 - Yukon
Major improvements in AS for SQL Server 2005 http://msdn.microsoft.com/library/default.as p?url=/library/enus/dnsql90/html/OvASDMEnvr.asp

The various MS pivot tools


Pivot Table Service
Included as part of AS but also installed as part of MS Office Client liason to AS Can also create and communicate with local cubes that are not part of AS

Excel Pivot Table


A flexible crosstab type reporting tool which is part of Excel

Office Pivot Table List


Part of Office Web Components Its an Active X control Similar to Excel Pivot Table but with some differences in functionallity

Das könnte Ihnen auch gefallen