You are on page 1of 8

Data Warehousing & BI:

________________________________________________________________________________
_______________________________________
General Concept DW:
-A giant storehouse for your data
-ALL of your data
-Aggregation of data from multiple systems
General Concept BI:
-leveraging data you already have to convert knowledge into informed act
ions
Tools:
-Aggregations
-Trends: ex graphs
-Correlations (Data Mining) Ex: probabilities
On a DW :
-Uses OLAP
-nr of tables reduced, reducing number of joints having more simplicity
-data is de-normalized intro strucutres easier to work with
-2 type of tables(Who require SK(Surrogate Key)):
-Facts
-Dimensions
SK(Surrogate Key) : Acts like a PK for the new de-normalized table
NK(Natural Key) : Source System key (PK)
Fact table:
-A fact marks EVENTS AND TIME
-Facts JOIN DIMENSIONS
-Facts also hold numeric measures to quantify the fact, 'how muc
h'.
Dimensions:
-Dimensions hold the values that describe facts
-Look up values
-Dimensions can change over tiem
-Type of data:
-Static Data: Colors , Car Names
-The impact of changing data leaves no history OVERWRITI
NG (Ex changing ones name changes it for past dates too) (Type 1 Dimension)
-Tracking changes is important (Type 2 dimensional)
-Separating history from day to day data needs
-When a dimension is changed , old record is updated in
history table > current one copied in (type 3 dimension Alternate Reality) HYBRI
D Dim
The Microsoft Toolset:
-ETL:
-Extract -> Transform -> Load
-SSIS - SQL Server Intergration Services
-Analytics:
-Aggregation - Trending - Correlations
-SSAS - SQL Server Analysis Services
-Reporting:
-SSRS - SQL Server Reporting Services
-SharePoint Performance Point

-PowerPivot (For BI Functions):


- Add-in Microsoft Excel
The Grain:
-Atomic Grain : Per Person , Client , etc.
-Summary Grain (Aggregate) Sum of Persons , Clients , Tickets, etc.
Finding the Grain:
-Understand your business process
-Define the grain at the atomic level if possible
-Analyze your data sources to understand the grain reality
Type of Fact Tables
-Transaction : Row by each transaction
-Periodic Snapshot : Summarizes per day , month , year , etc.
-Accumulating Snapshot : Summarizes from start date to end date
-Joins out on PK to other star schemas
Dimensions type
-Role Playing : When in the same table we have 2 types of things
modyfing the same data in 2 diff fields : ActualDate , ModifiedDate
-Outrigger : Connects to other tables by the heart table , fact.
-Junk : Aditional ways to analyze data usually used with bit ty
pe

Dimension table architecture:


-SK
-NK
-Flags and Indicators(Specially for type 2) showing that this is
the latest modification
Dimension Hierarchy Types
-Natural Hierarchies
-User Hierarchies
________________________________________________________________________________
_________________________________________________________________________
Kimball Book - The Data Warehouse 3rd Edition
Chapter 1 Data Warehousing , BI and Dimensional Modeling Primer
-Most important : WE FIRST NEED TO CONSIDER THE NEEDS OF THE BUSINESS after that
we work backwards through the technology and design
BI "Rules" :
-The DW/BI system must make information easily accessible.
-The DW/BI system must present information consistently.The data in the
DW/BI system must be credible. Data must be carefully assembled from a
variety of sources, cleansed, quality assured, and released only when it is fi t
for user consumption. Consistency
-The DW/BI system must adapt to change.
-The DW/BI system must present information in a timely way.As the DW/
BI system is used more intensively for operational decisions, raw data may

need to be converted into actionable information within hours, minutes,


or even seconds. The DW/BI team and business users need to have realistic
expectations for what it means to deliver data when there is little time to
clean or validate it.
-The DW/BI system must be a secure bastion that protects the information
assets. An organization s informational crown jewels are stored in the data
warehouse. At a minimum, the warehouse likely contains information about
what you re selling to whom at what price potentially harmful details in the
hands of the wrong people. The DW/BI system must eff ectively control access
to the organization s confi dential information.
- The DW/BI system must serve as the authoritative and trustworthy found
ation
for improved decision making. The data warehouse must have the
right data to support decision making. The most important outputs from a
DW/BI system are the decisions that are made based on the analytic evidence
presented; these decisions deliver the business impact and value attributable
to the DW/BI system. The original label that predates DW/BI is still the best
description of what you are designing: a decision support system.
- The business community must accept the DW/BI system to deem it success
ful.
It doesn t matter that you built an elegant solution using best-of-breed products
and platforms. If the business community does not embrace the DW/BI environment
and actively use it, you have failed the acceptance test. Unlike an operational
system implementation where business users have no choice but to use
the new system, DW/BI usage is sometimes optional. Business users will embrace
the DW/BI system if it is the simple and fast source for actionable information.
Rules of BI summary :
- Easy acces to the information
- Data must be consistent carefully assembled from a variety of sources
VERY CLEARLY
- Adaptation to change
- Realistic expectations when data could be displayed
- Security of the information
- Trustworthy datas for decision making
- The business community must accept the system (even if you think its p
erfect they will think otherwise if they don't feel fit for it)
BI/DW Deve responsibilities :
Understand the business users:
Understand their job responsibilities, goals, and objectives.
Determine the decisions that the business users want to make with the
help of the DW/BI system.
Identify the best users who make effective, high-impact decisions.
Find potential new users and make them aware of the DW/BI system s
capabilities.
Deliver high-quality, relevant, and accessible information and analytics to
the business users:
Choose the most robust, actionable data to present in the DW/BI system,
carefully selected from the vast universe of possible data sources
in your organization.
Make the user interfaces and applications simple and template-driven,
explicitly matched to the users cognitive processing profiles.
Make sure the data is accurate and can be trusted, labeling it consistently
across the enterprise.
Continuously monitor the accuracy of the data and analyses.
Adapt to changing user profiles, requirements, and business priorities,
along with the availability of new data sources.

Sustain the DW/BI environment:


Take a portion of the credit for the business decisions made using the
DW/BI system, and use these successes to justify staffing and ongoing
expenditures.
Update the DW/BI system on a regular basis.
Maintain the business users trust.
Keep the business users, executive sponsors, and IT management
happy.
Fact table : The fact table in a dimensional model stores the performance measur
ements resulting
from an organization s business process events
The term fact represents a business measure.
Each row in a fact table corresponds to a measurement event. The data on each
row is at a specifi c level of detail, referred to as the grain, such as one row
per product
Data Warehousing, Business Intelligence, and Dimensional Modeling Primer 11
sold on a sales transaction.
It is important that you do not
try to fi ll the fact table with zeros representing no activity because these ze
ros would
overwhelm most fact tables.
Dimensions provide the entry points to the data, and the fi nal labels and
groupings on all DW/BI analyses.
You often make the decision
by asking whether the column is a measurement that takes on lots of values and
participates in calculations (making it a fact) or is a discretely valued descri
ption
that is more or less constant and participates in constraints and row labels (ma
king
it a dimensional attribute).
Kimball steps of DW/BI arhitecture:
1)
2)
3)
4)

Operation Source Systems


ETL
Presentation Area to support BI
BI Applications

The most finely grained data must be available in the presentation area so that
users can ask the most precise questions possible.
You should think dimensionally at other critical junctures of a DW/BI project.
Four-Step Dimensional Design Process
The four key decisions made during the design of a dimensional model include:
1. Select the business process.
2. Declare the grain.
3. Identify the dimensions.
4. Identify the facts.
The grain establishes
exactly what a single fact table row represents.

The grain must be declared before choosing dimensions or facts because every can
didate dimension or fact must be consistent with the grain. different grains mus
t not be mixed in the same fact table.

A row in a transaction fact table corresponds to a measurement event at a point


in
space and time , These fact tables always contain
a foreign key for each associated dimension, and optionally contain precise
time stamps and degenerate dimension keys. The measured numeric facts must be
consistent with the transaction grain.
Factless Fact Tables
durable supernatural key. - it;s like the nk remains the same if we modify the r
ow

Dealing with Dimension Hierarchies:


-Fixed Depth Positional Hierarchies
-Slightly Ragged/Variable Depth Hierarchies
-Ragged/Variable Depth Hierarchies with Hierarchy Bridge Tables
-Ragged/Variable Depth Hierarchies with Pathstring Attributes
Advanced Fact Table Techniques
-Fact Table Surrogate Keys
-

If a stable numeric value is used predominantly for filtering


and grouping, it should be treated as a dimension attribute;
In order to tie a business
initiative to a business process representing a project-sized unit of work for t
he
Retail Sales 71
DW/BI team, you need to decompose the business initiative into the underlying
processes.

Dimensions = How do business people describe the data resulting from the business
process measurement events?
Facts
= What is the process measuring? . Additive and Non-Additive Measures a
nd Semi-Additive . Additive you can SUM them ex: debit , credit in bank. semi-a
ditive can be summed but not all , ex balance. Not additive like Ratio's cannot
be aggregated.
ONLY TYPES OF NUMBER OF ATTRIBUTES THAT CHANGE FREQUENTLRY GO INTO FACT TABLE

________________________________________________________________________________
________________________________________________________________________________
_____
Training Kit : Implementing a Data Warehouse with SSMS 2012
Data Warehouse and Logical Design
data warehouse (DW). A DW is a
centralized data silo for an enterprise that contains merged, cleansed, and hist
orical data.
Every piece of information must be stored exactly once. This way, you can enforc
e data integrity.
If you were to form a proposition from a row in a fact table, you might express
it with
a sentence such as, Customer A purchased product B on date C in quantity D for am
ount E.
This proposition is a fact; this is how the fact table got its name.
a Star schema represents a multidimensional
hypercube.
Dimensions with connections
to multiple fact tables are called shared or conformed dimensions.
Auditing and Lineage
In addition to tables for reports, a data warehouse may also include auditing ta
bles. For every
update, you should audit who made the update, when it was made, and how many row
s were
transferred to each dimension and fact table in your DW.
Dimensions give context to measures.
Types of columns in dimensions:
Keys Used to identify entities
- Name columns Used for human names of entities
- Attributes Used for pivoting in analyses
- Member properties Used for labels in a report
- Lineage columns Used for auditing, and never exposed to end users
Type of columns in facts:
-FK
-Measures
-Lineage columns(optional)
-Business key columns
SEARCH MORE ON ADDITIVE MEASURES
Check more on indexes dw/bi

The DW data is not online, real-time


data. You do not need to back up the transaction log for your data warehouse, as
you would in
an LOB database. Therefore, the recovery model for your data warehouse should be
Simple.

________________________________________________________________________________
______________________________________________
Talend:
Connection types between components:
-Row connection:
-Main
This type of row connection is the most commonly used connection. It passes on d
ata flows from one component to the other, iterating on each row and reading inp
ut data according to the component properties setting (schema).

________________________________________________________________________________
_______________________________________
Chapter 3. SSIS

-LOB and DW Management are different


Based on the level of complexity, data movement scenarios can be divided into tw
o
groups:
- Simple data movements, where data is moved from the source to the dest
ination
as-is (unmodified)
- Complex data movements, where the data needs to be transformed before
it can be
stored, and where additional programmatic logic is required to accommodate the
merging of the new and/or modified data, arriving from the source, with existing
data,
already present at the destination
All fact tables in a drill-across query must use conformed dimensions.
The actual drill-across query consists of a multi-pass set of separate requests
to the target fact tables followed by a simple sort/merge on the identical row h

eaders returned from each request.