Beruflich Dokumente
Kultur Dokumente
Data is good
BI Consumer Pool
Operations research / management science
Microsoft Examples
Excel, Access, Reporting Services, Data Analyzer, Office, 3rd party tools
Data is:
Subject oriented Integrated Time-Variant Nonvolatile
1. Subject Orientation
TPS organized around processes, functions
billing, banking, purchasing, payroll, etc.
Transactions
TPS processes transactions DW stores summary info related to transactions
TPS - keeps data needed for transaction DW - keeps data needed for analysis
2. Integration
DW must integrate data from different apps Create consistency across applications
naming conventions measurement of variables (units) data types encoding
DSS analyst - use the data, not worry about credibility/consistency of data
often best person to find subtle data problems
3. Time Variancy
TPS accurate at moment of access DW accurate as of some moment in time
Operational - current value data
Time horizon 60-90 days Key may or may not have an element of time Data can be updated
4. Nonvolatility
Change Replace
Insert
Insert
Load
Access
Change
Delete
Operational
Data warehouse
Data Models
Relational vs. Multi-dimensional
Transaction focused Focus on many linked, normalized tables Analysis focused Normalized fact table joined to a few highly non-normalized dimension tables Many simple, intuitive data models Lots of redundancy
What is OLAP?
Software tool providing multi-dimensional view of data for business analysis Example of Decision Support or Business Intelligence tool Fast data access and fast computations Interactive, flexible user interface Slice, dice, drill-down Excel Pivot Table and Pivot Chart are examples of simple OLAP tools
Dice select a subset of a cube by constraining two or more dimensions Drill through access atomic level detail data
Choose dimensions
Product, Customer, Store, Promotion, Time
extendable
Dimensions are highly nonnormalized (i.e. lots of redundancy
Atomic data
From 1st ed of Kimballs Data Warehouse Toolkit, notice no transaction # in the Sales Fact table
Need to do a Copy-Paste to get around lack of Rename capability for databases and hideous path based default name Now have an OLAP database and a valid Data Source On to Creating a Cube
Then we can Browse the cube with the Cube Browser in SQL Server and using Excel Cube Editor allows us to make changes to our cube that we could not do (or did incorrectly, or forgot to do) while in the Cube Wizard Similarly, the Dimension Editor allows to make dimension changes
Microsoft SQL Server 2000 Analysis Services will be our bridge today
OLAP Concepts
OLAP Practice
MS SQL Server
SQL Server client side tools such as Enterprise Manager, Query Analyzer, and DTS
Web applications
SQL Server Databases (1) data file (2) transaction log file
Analysis Services
OLAP and data mining (2) Important Client Tools
Enterprise Manager primary tool for managing SQL Server databases, security and other objects Query Analyzer a tool for building and analyzing SQL statements Data Transformation Services (DTS) a tool for manipulating data coming into or going out of a SQL Server database Analysis Manager tool for managing OLAP databases
Various dbs
Data warehouse
Cubes get data from warehouse and make data available to reports
OLAP Cube
Includes a few data mining models Runs as a service in Windows AS includes various administrative tools as well as a simple data browsing tool
Its NOT meant to be an OLAP front end pivoting tool
AS is getting another huge facelift in SQL Server 2005 and seems to be gaining strategic importance for Microsoft
SQL Server
Each row is a measurement Each row is at the same grain Many useful facts are numeric and additive
Cubes tend to be quite sum and count centric Well encounter additive, semi-additive and non-additive facts
Fact tables are usually deep (many rows) and narrow (not so many columns) Three common fact table types
Transaction facts [EX: sales] Periodic snapshot facts [EX: daily inventory facts] Accumulating snapshot facts [EX: status of something as it moves through relatively well defined phases or stages such as a product as it moves from order to manufacture to shipment]
This is the lowest level of detail you are choosing to store in the multidimensional data model Table is sparse in sense that rows representing the fact that nothing happened (i.e. values of 0) are usually NOT stored EXAMPLE: Lets check the grain and measures of the fact tables in FoodMart 2000.
Sales Facts Inventory Facts
How do people describe the data that results from the fact table rows? Dimension richness is what really makes the data warehouse
Lots of them filled with descriptive business information
Dimension tables are wide (many fields), denormalized (plenty of redundancy), with usually not too many rows Use surrogate keys (meaningless long integers) instead of natural keys (actual meaningful primary keys from source systems such as customer #s, invoice numbers, etc.)
Robust to changes in source systems Can handle source system key reuse Source keys may be complex strings (slow) with embedded info (hard to use)
Dimension Hierarchies
A conceptual relationship between related items in a dimension Region Year
State City Address
Highest level in hierarchy usually has least number of distinct values Hierarchies facilitate roll up of lower levels into higher levels as well as drill down from higher levels to lower levels A dimension can have multiple hierarchies
MS AS restricts to one hierarchy per dimension Other dimension attributes can be designated as Member Properties MS AS 2005 will support multiple hierarchies within a dimension
Time Dimension
Almost all MD models have a time or date dimension
Can build this in advance independent of the application Include many, many attributes that describe dates/times such as various time periods in calendar time as well as fiscal time, holiday indicators, weekday names, etc. Lets explore the smallish time dimension in FoodMart and a slightly richer one in grocer.mdb
Some tools (including AS) will let you create a time dimension from any date/time field
Probably better to create your own fully seeded date table for the time dimension Ensures no time gaps due to no facts
Analysis Manager is NOT a full featured cube browsing and analysis tool Has basic browsing ability to aid in cube creation/modification
A Few Products
Part of SQL Server 2000 Create OLAP cubes, some data mining
Hyperion Essbase
Full suite of business intelligence developer and end user tools
Pentaho
A new open source business intelligence project http://www.pentaho.org/
Lets OLAP
Download the following from the Downloads section of course web:
CallCenter-DataWarehouse.mdb CallCenterPivot.xls
Lets look at Excel Pivot Tutorial Can even publish Pivot Tables to Web
Choose dimensions
Customer, application, problem, time
Time dimension
Non-normalized
Fact table
Non-normalized
Normalized
Non-normalized
Non-normalized
Application dimension
Problem dimension
Reference Library
BI Resources
The Data Warehousing Institute http://www.tdwi.org/ Kimball and Associates http://www.ralphkimball.com./html/articles.html A Dimensional Modeling Manifesto Kimball, R. http://www.dbmsmag.com/9708d15.html DSS Resources http://dssresources.com/ Data Warehousing Information Center http://www.dwinfocenter.org/ Intelligent Enterprise http://www.intelligententerprise.com/ DM Review http://dmreview.com/ KDNuggets http://www.kdnuggets.com/ IT Toolbox http://www.ittoolbox.com/ http://businessintelligence.ittoolbox.com/ http://datawarehouse.ittoolbox.com/ OLAP Report http://www.olapreport.com/
Some free stuff (nice history of OLAP and commentary on industry trends Other stuff costs $
http://www.mosha.com/msolap/
Awesome set of resources from the lead developer on MS SQL Server Analysis Server team
Best practices for Business Intelligence using the Microsoft Data Warehousing Framework
A white paper from Microsoft
MS AS 2005 - Yukon
Major improvements in AS for SQL Server 2005 http://msdn.microsoft.com/library/default.as p?url=/library/enus/dnsql90/html/OvASDMEnvr.asp