Beruflich Dokumente
Kultur Dokumente
com
1
Data Warehouse Concepts
What is Data Ware House?
A DWH is a collection of Data Marts representing historical data from different
operational data source (OLTP). The data from these OLTP are structured and optimized
for querying and data analysis in DWH.
A Data warehouse is a relational database that is designed for Query and analysis
rather than for transaction processing. It usually contains historical data derived from
transactional data, but it can include data from other sources. It separates analysis
workload and enables an organization to consolidate data from several sources. In
addition to a relational database, a DWH environment includes an ETL solution, an
OLAP engine, client analysis tools and other applications that manage the process of
gathering data and delivering to business users. The characteristics of a DWH are
Subject-Oriented: Information in the data warehouse should revolve around the
subject and should give all the information regarding that subject. DWHs are designed to
help you analyze data. For example, to learn more about the companys sales data, you
can build a warehouse that concentrates on sales. This ability to define a DWH by subject
matter, sales in this case makes the DWH subject oriented.
Integrated: There should be consistency when loading the data from different
heterogeneous system and transforming it. It is closely related to subject orientation.
DWHs put data from desperate sources into a consistent format. They must resolve such
problems as naming conflicts and inconsistencies among units of measure. When they
achieve this, they are said be integrated.
Nonvolatile: It means that, once entered into the warehouse, data should not
change. This is logical because the purpose of a warehouse is to enable you to analyze
what has occurred and whatever once happened never changes.
Time-Variant: In order to discover trends, analysts need large amounts of data.
This is very much in contrast to OLTP systems, where performance requirements demand
that historical data be moved to an archive. A DWH focus on change over time is what is
meant by the term time variant.
What are the uses of DATAWAREHOUSE?
It separates analysis workload and enables an organization to consolidate data
from several sources.
It manages the process of gathering data and delivering to business users.
It is used to analyze data.
It puts data from desperate sources into a consistent format.
vanuguard@gmail.com
2
What is a Data Mart?
A Data Mart is a focused subset of a DWH that deals with a single area of data
and is organized for quick analysis. It contains the summarized data of the warehouses
and is referred as High Performance Query Structures. They consist of Materialized
Views and Special Indexes. In some businesses these data marts may be maintained
within the warehouses whereas, in some other scenarios they may be maintained apart
from the DWHs.
What are the difference between Database, DATAWAREHOUSE and Data Marts?
A Database is an organized collection of data.
A DWH is a very large database with special set of tools to extract and cleanse data from
operational systems and to analyze data.
A Data Mart is a focused subset of a DWH that deals with a single area of data and is
organized for quick analysis.
What is Dimension Modeling?
A Dimension Modeling is high level methodology used to implement the Star Schema
Structure which is done by the Data Modeling.
Or
Dimension Model is composed of one Table with a multiple key, called the fact table
and a set of smaller tables called dimension tables. Each Dimension table has a single
part primary key that corresponds exactly to one of the components of the multipart
key in fact table.
Or
Dimension Modeling is nothing but maintaining the relation ship between dimension
table and fact table using primary key and foreign key.
What is meant by OLTP?
OLTP stands for On-Line Transaction Processing. This is a standard,
normalized database structure. OLTP is designed for Transactions i.e., day-to-day
transactions. OLTP database has hundreds of users connected to it. These databases are
normalized to reduce the redundancy of the data & increase the performance while
inserting the data. The ratio of no. of records being inserted is more than the ration of no.
of records being updated or deleted. OLTP systems are not designed for analysis,
reporting and decision support. Examples: ATM Machines, Online Shopping, Online
Application Filling, and Online Railway Reservations.
What is meant by OLAP? What are the types of OLAP?
OLAP stands for On-Line Analytical Processing. OLAP system stores data in
multidimensional databases. User accesses these databases to perform financial and
statistical analysis on different combinations of the data. An OLAP database is generally
used to analyze data. It is optimized so that user can quickly retrieve data. An OLAP
database is generally created from the information we have put in an OLTP database.
OLAP products can be grouped into 3 categories.
vanuguard@gmail.com
3
MOLAP: (Multidimensional OLAP)
Data is stored multidimensional arrays/cube in order to be viewed in a
multidimensional manner.
Multidimensional arrays provide efficiency in storage and operations. Examples:
ORACLE Express Servers, Essbase by Hyperion Software, Power play by
Cognos.
MOLAP does not support ad-hoc queries because it is optimized for
multidimensional operations
It can perform complex calculations. All the calculations have been pre-generated
when cube is created.
Retrieval is Fast
Storage is very efficient