Sie sind auf Seite 1von 7

DATA WAREHOUSING CONCEPTS

Data Warehouse: A warehouse is a subject-oriented, integrated, time-variant and non-volatile


collection of data in support of management's decision making process.

Data Warehousing: The process of designing, building, and maintaining a data warehouse
system.

Data Mart: It is the logical subset of enterprise data warehouse, organized around a
single business process. It is less expensive and much smaller than a full blown corporate data
warehouse. It is also known as local data warehouse.

Data Model: It is an abstraction of some aspect of the real world or system. It helps to visualize
the business. Based on the model the data warehouse will be implemented.

Staging Area: It accepts data from different sources to cleanse the source data

Data dictionary: It is a collection of definitions and specifications for data categories and their
relationships. It is a database of data about data (metadata).

Levels of Data Modeling

Conceptual Modeling: It describes the data requirements from a business pint of view without
technical details.

Features of conceptual data model include:

• Includes the important entities and the relationships among them.


• No attribute is specified.
• No primary key is specified

Logical Modeling: At this level, the data modeler attempts to describe the data in as much detail
as possible, without regard to how they will be physically implemented in the database. It is data
structure oriented.

Features of logical data model include:

• Includes all entities and relationships among them.


• All attributes for each entity are specified.
• The primary key for each entity specified.
• Foreign keys (keys identifying the relationship between different entities) are specified.
• Normalization occurs at this level.

Physical Modeling: At this level, the data modeler will specify how the logical data model will be
realized in the actual database schema.

Features of physical data model include:

• Specification all tables and columns.


• Foreign keys are used to identify relationships between tables.
• Denormalization may occur based on user requirements.
• Physical considerations may cause the physical data model to be quite different from the
logical data model.

Dimensional Modeling: A type of data modeling suited for data warehousing. It uses three basic
concepts: measures, facts, and dimensions.

Slowly changing dimension: A dimension whose attributes remain almost constant over time
requiring relatively minor alterations to represent the evolved state.

Measures: They are calculated values stored in a fact tables. They are numeric attributes of a
fact representing the performance or behavior of the business relative to dimensions.

e.g.- Sales Count, Sales Price, Cost, Discount, and Profit

Fact: It is a collection of related data items, consisting of measures and context data.

e.g.- quantities, prices etc.

Additive facts: Here we are able to add facts along all the dimensions. These are discrete
numerical measures

e.g-. Retail sales

Semi additive Facts: These facts are not additive along time dimension

e.g. Account balance

Non additive Facts: The numeric measures that cannot be added across any dimensions

e.g.– Room temperature

Dimension: It is a collection of members or units of the same type of information. It is one of the
perspectives that can be used to analyze the data.

e.g. - For a Sales database, the dimensions could include Product, Time, Store, and Promotion.

Fact tables: A table that is used to store business information. There are two types of fields in a
fact table:

1. The fields storing the foreign keys which connect each particular fact to the appropriate
value in each dimension.

2. The fields storing the individual facts (or measures) - such as number, amount, or price.

Dimension table: Tables used to store qualitative data about fact records like who, what, when,
where, why.
Conformed Dimension: A dimension that has exactly the same meaning and content when
being referred from different fact tables

Multidimensional analysis: A process of analysis that involves organizing and summarizing


data in a multiple number of dimensions.

Star Schema: In the star schema design, a single object (the fact table) sits in the middle and is
radially connected to other surrounding objects (dimension lookup tables) like a star. The
dimension tables are denormalised. The fact table primary key is the composite of the foreign
keys.

Snowflake Schema: Here normalized dimension tables surround the single fact tables.

Cube: It is the fundamental data structure for multidimensional analysis. A cube contains
dimensions, hierarchies, levels, and measures. Each individual point in a cube is referred to as a
cell.

Slicing and dicing: Slicing and dicing refers to the ability to combine and re-combine the
dimensions to see different slices of the information. Picture slicing a three-dimensional cube of
information, in order to see what values are contained in the middle layer. Slicing and dicing a
cube allows an end-user to do the same thing with multiple dimensions.

Cross tab: A process or function that combines or summarizes data from one or more sources
into a two dimensional format for analysis or reporting.

Drill up: Changing the view of the data to a higher level of aggregation.

Drill down: Changing the view of the data to a greater level of detail.

Hierarchies: Organization of data into a logical tree structure.

Dimensions can have one or more hierarchies. A Time dimension, for example, could have a
Calendar hierarchy and a Fiscal hierarchy. Hierarchies contain levels, which organize data into a
logical structure.

It is the combination of a multidimensional with a hierarchical view in Business Intelligence


Software that allows users to grasp large amounts of data. If each member in a level has 5 to 10
children that are members at the next lower level, the user has a better chance of understanding
the significance of the data.

Moving between the levels of a hierarchy is called drilling up and drilling down.

Level: A position in hierarchy

e.g.- A time dimension might have a hierarchy that represents data at the month, quarter, year
levels
Aggregation: Information stored in a data warehouse in a summarized form.

Instead of recording the date and time each time a certain product is sold, the data warehouse
could store the quantity of the product sold each hour, each day, or each week.

Aggregations are used for two primary reasons:

To save storage space. Data warehouses can get large. The use of aggregations greatly reduces
the space needed to store data.

To improve the performance of business intelligence tools. When queries run faster they take up
less processing time and the users get their information back more quickly.

Primary Key: It is the key that is used to uniquely identify a record

Foreign key: Primary key of some table that is used in the table we are referring.

Surrogate key: It is the system generated key for dimensions. A surrogate key has no meaning
by itself. These are integer keys, so less space is required.

Normalization: The process of organizing data in accordance with the rules of a relational
database.

In a completely de-normalized database the customer name and address information would be
stored every time a customer made a purchase.

In a normalized database each customer's name and address would be stored only once, in a
separate table. Every purchase record would have a reference to the customer table to indicate
which customer was involved.

OLTP (OnLine Transaction Processing): The use of computers to run the on-going operation
of a business.

OLAP (On-Line Analytical Processing) : The use of computers to analyze an organization's


data.

"OLAP" is the most widely used term for multidimensional analysis software. The term "On-Line
Analytical Processing" was developed to distinguish data warehousing activities from "On-Line
Transaction Processing" - the use of computers to run the on-going operation of a business.

In its broadest usage the term "OLAP" is used as a synonym of "data warehousing". In a more
narrow usage, the term OLAP is used to refer to the tools used for Multidimensional Analysis

Relational On-Line Analytical Processing (ROLAP): OLAP that stores data and aggregations
in a relational database.

Cuboids: The relational tables used in ROLAP are called cuboid. Each cuboid represent a
particular view.
Multidimensional On-Line Analytical Processing(MOLAP): OLAP that stores data and
aggregations in a multidimensional database structures.

Hybrid OLAP (HOLAP): A combined use of Relational OLAP (ROLAP) and Multidimensional
OLAP (MOLAP).

In HOLAP, the source data is usually stored using a ROLAP strategy and aggregations are stored
using a MOLAP strategy. This combination usually results in the least amount of storage space
and the fastest cube processing.

Web based On-Line Analytical Processing(WOLAP): WOLAP uses a browser to deliver OLAP.
This is quite powerful. The delivery capability of the web coupled with the business intelligence
tools of OLAP will allow a broader number of business analysts to benefit from the software.

Decision Support System (DSS): A computer system designed to assist an organization in


making decisions.Databases,datawarehouses, datamarts in conjunction with reporting and
analysis software optimized to support timely business decision making together make a
DSS.

Data Mining: Techniques for finding patterns and trends in large data sets.

Operational data store(ODS): It may keep data for a smaller time period than warehouse and it
may feed the warehouse

Extraction: It is the process of getting data out of one data source

Cleansing: It is the process of resolving inconsistencies and fixing the anomalies in source data

Transformation: It is modifying the extracted data

Validation: It is the process verifying metadata definitions and configuration parameters

Loading: It is the insertion of the transformed data into a different data store.

Attribute: Attributes represent a single type of information in a dimension. For example, year is
an attribute in the Time dimension.

Granularity: The level of detail of the facts stored in a data warehouse.

Grain: The fundamental atomic level of data to be represented in the fact table

e.g.-When considering time, day week, month, year are the grains

Business Intelligence: is the usage of DW to help make business decisions


and recommendations. Information and data rules engines are leveraged
here to help make these decisions along with statistical analysis tools
and data mining tools.

Multi Dimensional Expressions(MDX): The querying language for OLAP cubes.


e.g.- The following query returns a cellset with the names of the store regions on the columns, the names
of product families on the rows, and the profit displayed in the cells:

select

[Stores].[Region].Members on columns,

[Products].{Product Family].Members on rows

from SalesCube

where ([Measures].[Profit])

Ad Hoc Query: Any query that cannot be determined prior to the moment the query is issued

Update window: The length of time available for updating a warehouse.

Change data capture: A process of capturing changes made to a production data source

Data administration: The processes and procedures by which the integrity and concurrency of
the data in the warehouse are maintained

Pivoting: A
transformation where each record in an input stream is converted to
manyrecords in the appropriate table in the data warehouse.

Partition:A transformation where each record in an input stream is converted to many


records in the appropriate table in the data warehouse.

Das könnte Ihnen auch gefallen