DWH

1) What is a data warehouse?
A data warehouse is a database which,

1.Maintains history of data
2.Contains Integrated data (data from multiple business lines)
3.Contains Heterogeneous data (data from different source formats)
4.Contains Aggregated data
5.Allows only select to restrict data manipulation
6.Data will be stored in de-normalized format
Definition of a data warehouse:

1. Subject-oriented
2. Integrated
3. Non-volatile
4. Time-Variant
Main Usage of a data warehouse:
1. Data Analysis
2. Decision Makings
3. Planning or Forecasting
2)What is a dimension?
A Dimension table is a table where it contains only non-quantifying data and

category of
information which are key for analysis. A dimension table contains primary key
and
non-quantifying columns. If the primary key does not exist in source table then
surrogate
key would exist.
3)What are the types of dimension?
Based on what type of data it stores there is two major types dimension table,
1.Confirmed dimension
2.Junk dimension
Based on where it�s being derived there is one dimension category,
3.Degenerated dimension
Based on how frequently the data in the dimension can be divided into 2 types,
4.Rapidly Changing Dimension (RCD)
5.Slowly Changing Dimension (SCD)
4)What is a fact and what are the types of fact?
A fact is a column or attribute which can be quantifiable or measurable and

will be used
as key analysis factor. We can call it as a measure.
Types of Fact:
1. Additive
2. Semi-additive
3. Non-additive
5)What does a fact table contain?
A table which contains facts is called fact table. Typically a fact table has
facts and
foreign keys of dimension tables.
Fact table structure:
Foriegn_key1
Forign_keyN
Fact1
FactN
6)What are the types of a fact table?
Transactional
The fact table will contain data�s in very detail level without any
rollup/aggregation
the way how transactional database stores.
Accumulating
Accumulating refers storing multiple entries for a single record to track the
changes
throughout the workflow.
Periodic snapshot
The data will be extracted and loaded for a particular period of a time. It
describes what
would be the state of the record in that specific period.
Factless fact table
When a fact table does not have any fact is called Factless fact table. It has
only
foreign keys of dimension tables.
7)Why staging table is required?
1.To reduce the complexity of Job (It will be more complex when we move
directly from Source
to Target)
2.To avoid the source database update.
3.To perform any calculations.
4.To perform data cleansing process as per business need.
5.When the data has been corrupted in Target after the load, we can delete the
corrupted data in
Target database after that we can just load the unloaded/deleted data alone
into Target from
staging database.
8)What is a surrogate key?
In most of the table, the primary key will be loaded from source schema, but
some source table
might not have a primary key in such has by using sequence generator the
primary key will be
created, such keys are called Surrogate key.
In terms of usage, there is no difference between these two types of keys.
Both differ in the
way ofloading primary key loaded from the source table, whereas surrogate key
loaded by the
sequence generator.
9)OLTP vs DW database
OLTP DW
1. Dedicated database available for specific
subject area or business application 1.Integrated from different
business applications
2. It does not keep history 2.It keeps history of

data for analyzing
past performance
3. It allows user to perform the below

DML operations (Select, Insert, Update,Delete) 3.It allows only Select for
end users
4.The main purpose is for using day to day

transactions 4. Purpose is for analysis and
reporting
5. Data volume will be less 5.Data volume is huge
6. Data stored in normalized format 6.Data stored in de-

normalized format
10)Explain about star schemaOperational Data Store (ODS) vs Staging database
ODS Staging
1. It will have limited period of data

(30 to 90 days) 1. Based on type of load
it stores incremental
data or full volume of
data
2.Operational processing 2.Temporary data storage

and for doing data
cleansing and other
calculations
3.Integrated from different business lines 3.Based on business need,

normally the
each business line
would have dedicated
staging
This type schema contains the fact table in center position. As we know that fact
table
contains a reference to dimension tables. Then the fact table will be surrounded
by dimension
tables with foreign key reference. The dimension table will not have a
reference with any
other dimension.
11)Explain about snowflake schema?
This type also contains a fact table in center position. The fact table has
a reference
to dimension tables. The dimension table will have a reference to another
dimension.
The data will be stored in the more normalized form.
12)What is the difference between star and snowflake?
Star Snowflake
1. As there is no relationship between dimensions

to other dimensions the performance will be high. 1.Due to multiple links
between dimensions
the performance
will be low.
2.The number of joins will be less which makes

query complexity low 2.The number of joins
will be more
which makes query
complexity high
3.Consider the Project dimension mentioned in above

example it has Role column where the Role name value
will be stored against for each project in case of start
schema, the size of the table will be high 3.The role
information is separately
stored in a table and
the reference will be
linked in Project
dimension, it reduces
the table size
4.Data will be stored in de-normalized format

in dimension table 4.Data will be stored in
more normalized
format in dimension
tables
13)What is data cleansing?
Data cleansing is a process of removing irrelevant and redundant data, and

correcting
the incorrect and incomplete data. It is also called as data cleaning or data
scrubbing.
All organizations are growing drastically with huge competitions, they take
business decisions
based on their past performance data and future projection
14)What is data masking?
What does data masking mean? Organizations never want to disclose highly
confidential
information into all users. All sensitive data will be restricted to access in
all environments
other than production. The process of masking/hiding/encrypting sensitive data
is called
data masking.
15)Why Data mart?
1.The data warehouse database contains integrated data for all business lines,
for example, a banking data warehouse contains data for all saving, credit and
loan accounts
databases.
2.The reporting access level will be given to a person who has authority or
needs to see the
comparison of data for all three types of accounts.
3.Meanwhile, a loan account branch manager does not require to see the saving
and credit
card details, he wants to see only the past performance of loan account alone.
4.In that case for his analysis, we need to apply data level security to protect
saving and
credit information�s data warehouse.
5.At the same time, the number of end users across three accounts will access
the same data warehouse,
it will end up in poor performance.
6.To avoid these issues, the separate database will be built on top of data
warehouse,
named as the data mart. The access will be given for respective business line
resources not
for everyone.
16)What is data purging and archiving?
Data purging means deleting data from a database which crosses the defined
retention time.
Archiving means moving the data which crosses the defined retention time to
another database
(archival database).
17)What are the types of SCD?
SCD Type 1
-Modifications will be done on the same record
-Here no history of changes will be maintained
SCD Type 2
-An existing record will be marked as expired with is_active flag or
Expired_date column
-This type allows tracking the history of changes
SCD Type 3
-A new value will be tracked as a column
-Here history of changes will be maintained
18)What type of schema and SCD type used in your project?
In my current project, we are using type2 to keep the history of changes.

DWH

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

DWH

Hochgeladen von

Copyright:

Verfügbare Formate

1) What is a data warehouse?

A data warehouse is a database which,

Definition of a data warehouse:

A Dimension table is a table where it contains only non-quantifying data and

3)What are the types of dimension?

4)What is a fact and what are the types of fact?

A fact is a column or attribute which can be quantifiable or measurable and

5)What does a fact table contain?

6)What are the types of a fact table?

Factless fact table

7)Why staging table is required?

8)What is a surrogate key?

2. It does not keep history 2.It keeps history of

3. It allows user to perform the below

4.The main purpose is for using day to day

5. Data volume will be less 5.Data volume is huge

6. Data stored in normalized format 6.Data stored in de-

10)Explain about star schemaOperational Data Store (ODS) vs Staging database

1. It will have limited period of data

2.Operational processing 2.Temporary data storage

3.Integrated from different business lines 3.Based on business need,

11)Explain about snowflake schema?

12)What is the difference between star and snowflake?

1. As there is no relationship between dimensions

2.The number of joins will be less which makes

3.Consider the Project dimension mentioned in above

4.Data will be stored in de-normalized format

13)What is data cleansing?

Data cleansing is a process of removing irrelevant and redundant data, and

14)What is data masking?

15)Why Data mart?

16)What is data purging and archiving?

17)What are the types of SCD?

18)What type of schema and SCD type used in your project?

In my current project, we are using type2 to keep the history of changes.

Das könnte Ihnen auch gefallen