All About Datawarehouse

SOME TERMINOLOGIES USED IN DW ============================================================================ -INTRODUCTION TO DATAWAREHOUSE ============================================================================ DATAWAREHOUSE -system that retrieves and consolidates data periodically
from the source systems into dimensional or normalize data store. -It usually keeps years of history and is queried for business intelligence or other analytical activities. -It is typically updated in batches not everytime a transaction happens in the source system =========================================================================== OLTP -system whose main purpose is to capture and store the business transactions =========================================================================== PROFILER - is a tool that has the capability to analyze data, such as finding out how many rows are in each table, how many rows contain NULL values and so on.. =========================================================================== ETL(Extract Transform and Load) -brings data from various source systems into a staging area. -is also a system that has the capability to connect to the source systems , read the data, transform the data, and load it into a target system( target system doesnt have to be a datawarehouse.) =========================================================================== DDS - is a database that stores the datawarehouse data in a different format than OLTP. =========================================================================== Reason why DDS is the one being queried instead of the source system - data is arranged in a dimensional format that is more suitable for analysis. - contains integrated data from several source system. =========================================================================== METADATA - is a database containing information about the data structure, the data meaning, the data usage, the data quality rules, and other information about t he data. =========================================================================== DATA QUALITY RULES DATA CLEANSING - is the process of identifying and correcting dirty data. - this is implemented using data quality rules that define what dirty data is. 3 options when data is incorrect 1. rejected 2. corrected 3. allowed ===========================================================================
Another approach is ELT(Extract Load and TRansform) -Data is first loaded into the datawarehouse first in its raw format The transformation, lookups, deduplications and so on are performed inside the datawarehouse. ELT approach does not need ETL server unlike the ETL approach. -this approach is implemented to take advantage of powerful data warehouse datab ase engines such as massively parallel processing(MPP) systems. =========================================================================== THING TO CONSIDER IN CONSOLIDATION OF TRANSACTIONAL DATA *************************************************************************** HIGH AVAILABILITY -some data is available in several system but not in others Solution: -you need to be aware of unavailable columns and missing levels in the hierarchy. =========================================================================== TIME RANGE -data in different systems has different validity periods Solution: -You always need to examine what time period is applicable to which data before you consolidate the data. -Otherwise, you are at risk of having inaccurate data in the warehouse because y ou mixed different time periods. =========================================================================== DEFINITIONS -the term total weekly revenue in one system may have a different meaning from total weekly revenue in other systems Solution: -In this matter, you always need to examine the meaning of each piece of data. Just because they have the same name doesn t mean they are the same. This is important because you could have inaccurate data or meaningless d ata in the data warehouse if you consolidate data with different meanings. =========================================================================== CONVERSION -different systems may have different unit of measures or currency Solution: =========================================================================== MATCHING - merging data based on common identifiers between different systems. Solution: -The logic of determining a match can be simply based on the equation sign (=) t o identify an exact match. It can also be based on fuzzy logic or matching rules.
============================================================================ PERIODICALLY -you can determine the period of data retreival and consolidation based on the business requirements and the frequency of data updates in the source system. -the data retrieval interval needs to be the same as the source systems data upd ate frequency. -If the source sstem is updated once a day, you need to set the data retrieval o nce a day.There is no point extracting the data from that source system several times a day. -Always make sure the data retrieval interval satisfies the business requirement s. ============================================================================ DIMENSIONAL DATA STORE ======================= DDS -containing collection of datamarts -denormalized and dimensions are conformed. Sample of Dimensional Schema =========================================== a.Star schema -a dimension does not have a subtable(subdimension) characteristic: ========================================================================== b.SnowFlake schema -a dimension can have subdimension characteristic: it is simpler than snowflake and galaxy schema ========================================================================== c.Galaxy schema or fact constellation schema -you have two or more related fact tables surrounded by common dimensions characteristic: ========================================================================== TABLE PARTITIONING ====================== -is a method to split a table by rows into several parts and store each part in a different file to increase data loading and query performance. PARALLEL QUERYING =====================
-is a process where a single query is split into smaller parts and each part is given to an independent query-processing module. ========================================================================== SCD(SLOWLY CHANGING DIMENSION) -is a technique used in dimensional modeling for preserving historical information about dimensional data. SCD TYPE 1 - dont keep the historical information SCD TYPE 2 - you keep the historical information SCD TYPE 3 - keep historical information in columns ========================================================================== SNAPSHOT - is a copy of one or more master tables taken at a certain time. ========================================================================== PERIODIC SNAPSHOT A periodic snapshot is a snapshot that is taken at a regular interva CUSTOMER PROFITABILITY PREDICTIVE ANALYSIS WHAT IF SCENARIOS SLICE AND DICE ANALYTICAL EXERCISES ============================================================================== LAST TOPIC WHRE I STOP CHAPTER 1 - INTRO DUCTION TO DATAWAREHOUSING(DATA MINING)

All About Datawarehouse

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

All About Datawarehouse

Hochgeladen von

Copyright:

Verfügbare Formate

SOME TERMINOLOGIES USED IN DW ============================================================================ -INTRODUCTION TO DATAWAREHOUSE ============================================================================ DATAWAREHOUSE -system that retrieves and consolidates data periodically

Das könnte Ihnen auch gefallen