Beruflich Dokumente
Kultur Dokumente
End users will get much better performance query from a data mart than a data
warehouse.
End users will have a much easier time navigating through data marts.
Star Schema
A star schema is composed of two basic kinds of tables:
One fact table and
Multiple dimension tables.
The star schema is called a star because of, its appearance, with the fact table as the
actual star and the dimension tables as the light rays emitting from it.
Figure. A simple star schema modeling sales information
PRODUCT
product_key
prod_nm
SALES
prod_line_nm
date_key (F K)
Div_nm
(F K)
flavorcustomer_key
_nm
product_key
units_of
measnm (F K)
Invoice_number
units_per_pkg
units_sold
intro_date
unit_prise
Add_date
unit_cost
update_date
active_fg
CUSTOMER
customer_key
customer_name
street_addr
city
state
zip
add_dt
update_dt
active_fg
DT
date_key
dt
day_of_wk
month_cd
holiday_name
add_dt
update_dt
active_fg
Fact Table
The fact table contains the actual transactions or values being analyzed.
Dimension Table
The dimension table contains descriptive information about transaction or values.
The Design Process
Fact Table Design
Each record in a fact table contains a primary key made up of a concatenation of foreign
keys to dimension tables and the facts or measures uniquely identified by that key.
People frequently (always?) refer to star schemas as denormalized meaning that the
normalized rules are intentionally ignored.
The denormalized structure is a flat file. Is a fact file really denormalized? No. In fact, the
fact table is a highly normalized structure.
Each row consists of a number of attributes (that is, the measure) that are all
attributable to only one primary key. Its certainly in first normal form, as it has
no repeating groups.
It satisfies the conditions for second normal form in that all the attributes are
dependent on primary key.
Finally, none of the attributes are depending on the non-key attributes. Thus, the
fact table is even in third normal form.
So, where is the denormalization? Its actually the stars dimension tables that are
denormalized, not the fact table.
The level of detail captured in a fact table is sometimes called level of granularity or
grain of that table. It is recommended that storing date at the most detailed level
possible. While this can result in much higher disk requirements, capturing the finest
level detail has a significant advantage.
Its always possible to re-create aggregate summary data from detail data, but you
cant create details from summaries.
The finest level of details is frequently called the atomic level of detail. Just as you
cant split an atom into its component parts and still recognize the element they came
from, you cant break an atomic-level transaction apart any further and have it
maintain its identity as a transaction.
Dimension Table
Alone, a fact table is pretty useless. Dimension tables provide meaning to each fact.
What does it mean that we sold 16 units of product_key 14 to customer_key 1714 on
date_key 000701? It means nothing until we decode those foreign keys.
Dimension Table Features
Some of the features of the dimension tables are:
Denormalized
Wide
Short
Use surrogate Keys
Contains Links to Corresponding Records in Source Tables
Contains Additional Date and Active Flag Fields
Denormalized
Dimension tables are usually highly denormalized. Although people refer to star schema
as being denormalized, they are actually referring to the dimension tables, not to the fact
table.
Thus, all the information regarding each dimension element appears on a single record in
the dimension table. Its as though you took a normalized design that describe product
and joined all the tables together to produce this single, denormalized dimension table. In
fact, this is generally how you build a dimension table.
When loading the dimension tables, we join many source tables together, to put the
results into one, flat, denormalized table. Each record in this table fully describes a
dimension element.
In the end, this helps end users query performance because when users run their queries,
the work needed to join these tables together has already been done. In essence, we are
pre-joining tables together, satisfying query-time resource requirements with load time
resources.
Remember that loading is usually done at night when no one needs access anyway.
Wide
The dimension table is wider than most tables in traditional database applications. By
wide, we mean that it has a lot of columns. The more columns you put into your
dimension tables, the more descriptive they will be.
Short
Dimension tables are generally far short than fact tables.
Figure. Dimension data in normalized (OLTP) and denormalized (dimension table) form
PRODUCT
product_code
flavor_cd (F K)
units_of_measure (F K)
prod_line_cd (F K)
product_name
prod_line_nm
units_per_pkg
intro_date
add_date
update_date
active_fg
PRODUCT_LINE
product_line_cd
div_cd (F K)
prod_line_nm
DIVISION
div_cd
div_nm
PRODUCT
FLAVOR
flavor_cd
Flavor_nm
STD_UNITS
units_of_measure_cd
units_of_measure_nm
Normalized
product_code
prod_cd
prod_nm
prod_line_nm
div_nm
flavor_nm
units_of_meas_nm
units_per_pkg
intro_date
add_dt
update_dt
active_fg
Denormalized
For one, it allows analysts to audit dimension tables to understand where their
data came from.
More importantly, these links aid the dimension table update process. When the
refresh process looks at the source tables for changes, having this link information
makes it quite easy to tell which source system records generated which
dimension table records.
For example, we might build a table that contains summaries of the sales fact records but
only at month level granularity. The SQL to create this summary table might look
something like the following:
create table sales_month_sum as
select sales.customer_key, sales.product_key, dt.month_cd,
Sum(units_sold) units_sold, units_sold, unit_price, unit_cost
from sales, dt
where sales.date_key = dt.date_key
group by sales.customer_key, sales.product_key, dt.month_cd, sales.unit_price,
sales.unit_cost
Materialized Views
You can create summary tables either manually create tables or using Oracles
materialized view functionality.
Materialized views are another term for Oracles snapshot feature. Implementing your
aggregates as materialized views has a few advantages.