Data Warehousing Tips

1. What are Data Marts?
Data Mart is a segment of a data warehouse that can provide data for reporting a
nd analysis on a section, unit, department or operation in the company, e.g. sal
es, payroll, production. Data marts are sometimes complete individual data wareh
ouses which are usually smaller than the corporate data warehouse.
2. What are slowly changing dimensions?
Dimensions that change over time are called Slowly Changing Dimensions. For inst
ance, a product price changes over time; People change their names for some reas
on; Country and State names may change over time. These are a few examples of Sl
owly Changing Dimensions since some changes are happening to them over a period
of time
While handling Slowly changing Dimension, Dimesion schema might required to chan
ge. It depends on Business Requirement.
E.g Dimension Table Product has Product ID and Price. If Price changes , if we u
pdate the Price in the Dimension we might end up in loosing History Data. In thi
s case we can Add One Column as Dateof Change. So if Price Changes for the given
date, 1 Record gets added in to Dimension while keeping the history intact.
3. Differences between star and snowflake schemas
The star schema is created when all the dimension tables directly link to the fa
ct table. Since the graphical representation resembles a star it is called a sta
r schema. It must be noted that the foreign keys in the fact table link to the p
rimary key of the dimension table. This sample provides the star schema for a sa
les_ fact for the year 1998. The dimensions created are Store, Customer, Product
_class and time_by_day. The Product table links to the product_class table throu
gh the primary key and indirectly to the fact table. The fact table contains for
eign keys that link to the dimension tables.
The snowflake schema is a schema in which the fact table is indirectly linked to
a number of dimension tables. The dimension tables are normalized to remove red
undant data and partitioned into a number of dimension tables for ease of mainte
nance. An example of the snowflake schema is the splitting of the Product dimens
ion into the product_category dimension and product_manufacturer dimension..
4. Why are OLTP database designs not generally a good idea for a Data Warehouse?
OLTP cannot store historical information about the organization. It is used for
storing the details of daily transactions while a datawarehouse is a huge storag
e of historical information obtained from different datamarts for making intelli
gent decisions about the organization.
5. What is a dimension table?
A dimensional table is a collection of hierarchies and categories along which th
e user can drill down and drill up. it contains only the textual attributes.
6. What are Aggregate tables?
Aggregate table contains the summary of existing warehouse data which is groupe
d to certain levels of dimensions.Retrieving the required data from the actual t
able, which have millions of records will take more time and also affects the se
rver performance.To avoid this we can aggregate the table to certain required le
vel and can use it.This tables reduces the load in the database server and incre
ases the performance of the query and can retrieve the result very fastly.
7. What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table in a
data warehouse. For example: Based on design you can decide to put the sales da
ta in each transaction. Now, level of granularity would mean what detail are you
willing to put for each transactional fact. Product sales with respect to each
minute or you want to aggregate it upto minute and put that data.
It also means that we can have (for example) data agregated for a year for a giv
en product as well as the data can be drilled down to Monthly, weekly and daily
basis...the lowest level is known as the grain. going down to details is Granula
rity
8. What are Semi-additive and factless facts and in which scenario will you use
such kinds of fact table?
Semi-Additive: Semi-additive facts are facts that can be summed up for some of t
he dimensions in the fact table, but not the others. For example:
Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-addit
ive fact, as it makes sense to add them up for all accounts (what's the total cu
rrent balance for all accounts in the bank?), but it does not make sense to add
them up through time (adding up all current balances for a given account for eac
h day of the month does not give us any useful information
A factless fact table captures the many-to-many relationships between
dimensions, but contains no numeric or textual facts. They are often used to rec
ord events or
coverage information. Common examples of factless fact tables include:
- Identifying product promotion events (to determine promoted products that didnt
sell)
- Tracking student attendance or registration events
- Tracking insurance-related accident events
- Identifying building, facility, and equipment schedules for a hospital or univ
ersity
9. Why should you put your data warehouse on a different system than your OLTP s
ystem?
Data Warehouse is a part of OLAP (On-Line Analytical Processing). It is the sour
ce from which any BI tools fetch data for Analytical, reporting or data mining p
urposes. It generally contains the data through the whole life cycle of the comp
any/product. DWH contains historical, integrated, denormalized, subject oriented
data.
However, on the other hand the OLTP system contains data that is generally limit
ed to last couple of months or a year at most. The nature of data in OLTP is: cu
rrent, volatile and highly normalized. Since, both systems are different in natu
re and functionality we should always keep them in different systems.
10. What are non-additive facts?
Non-additive facts are facts that cannot be summed up for any of
the dimensions present in the fact table. Example: temparature,bill number...etc
fact table typically has two types of columns: those that contain numeric facts
(often called measurements), and those that are foreign keys to dimension tables
.

A fact table contains either detail-level facts or facts that have been aggregat
ed. Fact tables that contain aggregated facts are often called summary tables. A
fact table usually contains facts with the same level of aggregation.

Though most facts are additive, they can also be semi-additive or non-additive.
Additive facts can be aggregated by simple arithmetical addition. A common examp
le of this is sales. Non-additive facts cannot be added at all.

An example of this is averages. Semi-additive facts can be aggregated along some
of the dimensions and not along others. An example of this is inventory levels,
where you cannot tell what a level means simply by looking at it.
11. What is Normalization, First Normal Form, Second Normal Form , Third Normal
Form?
Normalization can be defined as segregating of table into two different tables,
so as to avoid duplication of values.
The normalization is a step by step process of removing redundancies and depende
ncies of attributes in data structure

The condition of data at completion of each step is described as a normal form.
Needs for normalization : improves data base design.
Ensures minimum redundancy of data.
Reduces need to reorganize data when design is modified or enhanced.
Removes anomalies for database activities.

First normal form :
A table is in first normal form when it contains no repeating groups.
The repeating column or fields in an un normalized table are removed from the ta
ble and put in to tables of their own.
Such a table becomes dependent on the parent table from which it is derived.
The key to this table is called concatenated key, with the key of the parent tab
le forming a part it.

Second normal form:
A table is in second normal form if all its non_key fields fully dependent on th
e whole key.
This means that each field in a table ,must depend on the entire key.
Those that do not depend upon the combination key, are moved to another table on
whose key they depend on.
Structures which do not contain combination keys are automatically in second nor
mal form.
Third normal form:
A table is said to be in third normal form , if all the non key fields of the ta
ble are independent of all other non key fields of the same table.
12. What is SCD1 , SCD2 , SCD3? (Slowly Changing Dimentions)
SCD Type 1, the attribute value is overwritten with the new value, obliterating
the historical attribute values.For example, when the product roll-up
changes for a given product, the roll-up attribute is merely updated with the cu
rrent value.
SCD Type 2,a new record with the new attributes is added to the dimension table.
Historical fact table rows continue to reference the old dimension key with the
old roll-up attribute; going forward, the fact table rows will reference the ne
w surrogate key with the new roll-up thereby perfectly partitioning history.
SCDType 3, attributes are added to the dimension table to support two simultaneo
us roll-ups - perhaps the current product roll-up as well as current version minu
s one, or current version and original.
13. What are conformed dimensions?
They are dimension tables in a star schema data mart that adhere to a common str
ucture, and therefore allow queries to be executed across star schemas. For exam
ple, the Calendar dimension is commonly needed in most data marts. By making thi
s Calendar dimension adhere to a single structure, regardless of what data mart
it is used in your organization, you can query by date/time from one data mart t
o another to another.
Conformed dimentions are dimensions which are common to the cubes.(cubes are the
schemas contains facts and dimension tables)

Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are the Fac
ts and Dimensions
here D1,D2 are the Conformed Dimensions
if a table is used as a dimesion table for more than one fact tables. then the d
imesion table is called conformed dimesions.
14. What is ODS?
ODS stands for Operational Data Store.
It is the final integration point in the ETL process before loading the data int
o the Data Warehouse.
ODS stands for Operational Data Store. It contains near real time data. In typic
al data warehouse architecture, sometimes ODS is used for analytical reporting a
s well as souce for Data Warehouse.
Operationa Data Services is Hybrid structure that has some aspects of a data war
ehouse and other
aspects of an Operational system.
Contains integrated data.
It can support DSS processing.
It can also support High transaction processing.
Placed in between Warehouse and Web to support web users.
15. What is a CUBE in datawarehousing concept?
Cubes are logical representation of multidimensional data.The edge of the cube c
ontains dimension members and the body of the cube contains data values.
cubes r muti-dimensional view of dw or data marts. it is designed in a logical w
ay to drill, slice-n-dice. every part of the cube is a logical representation of
the combination of facts-dimension attribs.
16. Difference between Snow flake and Star Schema. What are situations where Sno
w flake Schema is better?
star schema and snowflake both serve the purpose of dimensional modeling when it
come to datawarehouses.
star schema is a dimensional model with a fact table ( large) and a set of dimen
sion tables ( small) . the whole set-up is totally denormalized.
however in cases where the dimension table are split to many table that is where
the schema is slighly inclined towards normalization ( reduce redundancy and de
pendency) there comes the snow flake schema.

the nature/purpose of the data that is to be feed to the model is the key to you
r question as to which is better.
------------------------
Star schema contains the dimesion tables mapped around one or more fact tables.
It is a denormalised model.
No need to use complicated joins.
Queries results fastly.
Snowflake schema
It is the normalised form of Star schema.
contains indepth joins ,bcas the tbales r splitted in to many pieces.We can easi
ly do modification directly in the tables.
We hav to use comlicated joins ,since we hav more tables .
There will be some delay in processing the Query .
------------------------
Star Schema means
A centralized fact table and sarounded by diffrent dimensions
Snowflake means
In the same star schema dimensions split into another dimensions
Star Schema contains Highly Denormalized Data
Snow flake contains Partially normalized
Star can not have parent table
But snow flake contain parent tables
Why need to go there Star:
Here 1)less joiners contains
2)simply database
3)support drilling up options
Why nedd to go Snowflake schema:
Here some times we used to provide seperate dimensions from existing dimensions
that time we will go to snowflake
Dis Advantage Of snowflake:
Query performance is very low because more joiners is there
------------------------
17. What is ER Diagram?
ER - Stands for entitity relationship diagrams. It is the first step in the desi
gn of data model which will later lead to a physical database design of possible
a OLTP or OLAP database
E.R Diagram(Entitiy Relationship diagram) means how the different database table
related to each other and what r the primary key and foreign key and their rela
tion.
it is the first step of any database project to build E-R Diagram
18. What is degenerate dimension table?
Degenerate Dimensions : If a table contains the values, which r neither dimesion
nor measures is called degenerate dimensions.Ex : invoice id,empno
19. What is data mining?
Data mining is a process of extracting hidden trends within a datawarehouse. For
example an insurance dataware house can be used to mine data for the most high
risk people to insure in a certain geographial area.
20. What is Data Warehousing?
Datawarehosing is a process of creating,queriring and populating datawarehouse.
it includes a number of discrete technologies like
Identifying sources
Process of ECCD, ETL which includes data cleansing , data transforming and data
loading to targets.
21. What is a Data Warehouse?
A Data warehouse is a subject oriented, integrated, time-variant, nonvolatile co
llection of data to enable decision making across disparate group of users.
A data warehouse is a repository containing subject-oriented, integrated,time-va
riant and non-volatile collection of data, used for companys decision support s
ystems requirement
Datawarehouse contains a collection of historic(history of data) ,integrated ,no
n-volatile data ,which is used for analysing and developing forecasting reports
.
22. Explain the advanatages of RAID 1, 1/0, and 5. What type of RAID setup would
you put your TX logs?
Raid 0 - Make several physical hard drives look like one hard drive. No redundan
cy but very fast. May use for temporary spaces where loss of the files will not
result in loss of committed data.

Raid 1- Mirroring. Each hard drive in the drive array has a twin. Each twin has
an exact copy of the other twins data so if one hard drive fails, the other is u
sed to pull the data. Raid 1 is half the speed of Raid 0 and the read and write
performance are good.

Raid 1/0 - Striped Raid 0, then mirrored Raid 1. Similar to Raid 1. Sometimes fa
ster than Raid 1. Depends on vendor implementation.

Raid 5 - Great for readonly systems. Write performance is 1/3rd that of Raid 1 b
ut Read is same as Raid 1. Raid 5 is great for DW but not good for OLTP.

Hard drives are cheap now so I always recommend Raid 1.
22. What type of Indexing mechanism do we need to use for a typical datawarehous
e?
On the fact table it is best to use bitmap indexes. Dimension tables can use bit
map and/or the other types of clustered/non-clustered, unique/non-unique indexes
.
To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports
bitmaps.
It generally depends upon the data which u hav ein table if u have less distinct
values in particular column its always that u built up bit map index... rather
that other one on dimension tables generally we have indexes...
23. What is a lookup table?
A lookup table is nothing but a 'lookup' it give values to referenced table (it
is a reference), it is used at the run time, it saves joins and space in terms o
f transformations. Example, a lookup table called states, provide actual state n
ame ('Texas') in place of TX to the output.
The Look Up table provides the detailed information about the attributes.For exa
mple, the lookup table for the quarter attribute would include a list of all the
quarters available in the data warehouse.i.e., first quarter of 2001 may be rep
resented as "Q1 2001" or "2001 Q1".
24.

Data Warehousing Tips

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Data Warehousing Tips

Hochgeladen von

Copyright:

Verfügbare Formate

1. What are Data Marts?

Das könnte Ihnen auch gefallen