Beruflich Dokumente
Kultur Dokumente
2)What is a dimension?
Based on what type of data it stores there is two major types dimension table,
1.Confirmed dimension
2.Junk dimension
Based on where it�s being derived there is one dimension category,
3.Degenerated dimension
Based on how frequently the data in the dimension can be divided into 2 types,
4.Rapidly Changing Dimension (RCD)
5.Slowly Changing Dimension (SCD)
A table which contains facts is called fact table. Typically a fact table has
facts and
foreign keys of dimension tables.
Fact table structure:
Foriegn_key1
Forign_keyN
Fact1
FactN
Transactional
The fact table will contain data�s in very detail level without any
rollup/aggregation
the way how transactional database stores.
Accumulating
Accumulating refers storing multiple entries for a single record to track the
changes
throughout the workflow.
Periodic snapshot
The data will be extracted and loaded for a particular period of a time. It
describes what
would be the state of the record in that specific period.
When a fact table does not have any fact is called Factless fact table. It has
only
foreign keys of dimension tables.
1.To reduce the complexity of Job (It will be more complex when we move
directly from Source
to Target)
2.To avoid the source database update.
3.To perform any calculations.
4.To perform data cleansing process as per business need.
5.When the data has been corrupted in Target after the load, we can delete the
corrupted data in
Target database after that we can just load the unloaded/deleted data alone
into Target from
staging database.
In most of the table, the primary key will be loaded from source schema, but
some source table
might not have a primary key in such has by using sequence generator the
primary key will be
created, such keys are called Surrogate key.
In terms of usage, there is no difference between these two types of keys.
Both differ in the
way ofloading primary key loaded from the source table, whereas surrogate key
loaded by the
sequence generator.
9)OLTP vs DW database
OLTP DW
1. Dedicated database available for specific
subject area or business application 1.Integrated from different
business applications
ODS Staging
This type schema contains the fact table in center position. As we know that fact
table
contains a reference to dimension tables. Then the fact table will be surrounded
by dimension
tables with foreign key reference. The dimension table will not have a
reference with any
other dimension.
This type also contains a fact table in center position. The fact table has
a reference
to dimension tables. The dimension table will have a reference to another
dimension.
The data will be stored in the more normalized form.
Star Snowflake
What does data masking mean? Organizations never want to disclose highly
confidential
information into all users. All sensitive data will be restricted to access in
all environments
other than production. The process of masking/hiding/encrypting sensitive data
is called
data masking.
1.The data warehouse database contains integrated data for all business lines,
for example, a banking data warehouse contains data for all saving, credit and
loan accounts
databases.
2.The reporting access level will be given to a person who has authority or
needs to see the
comparison of data for all three types of accounts.
3.Meanwhile, a loan account branch manager does not require to see the saving
and credit
card details, he wants to see only the past performance of loan account alone.
4.In that case for his analysis, we need to apply data level security to protect
saving and
credit information�s data warehouse.
5.At the same time, the number of end users across three accounts will access
the same data warehouse,
it will end up in poor performance.
6.To avoid these issues, the separate database will be built on top of data
warehouse,
named as the data mart. The access will be given for respective business line
resources not
for everyone.
Data purging means deleting data from a database which crosses the defined
retention time.
Archiving means moving the data which crosses the defined retention time to
another database
(archival database).
SCD Type 1
-Modifications will be done on the same record
-Here no history of changes will be maintained
SCD Type 2
-An existing record will be marked as expired with is_active flag or
Expired_date column
-This type allows tracking the history of changes
SCD Type 3
-A new value will be tracked as a column
-Here history of changes will be maintained