Beruflich Dokumente
Kultur Dokumente
Data Warehousing: The process of designing, building, and maintaining a data warehouse
system.
Data Mart: It is the logical subset of enterprise data warehouse, organized around a
single business process. It is less expensive and much smaller than a full blown corporate data
warehouse. It is also known as local data warehouse.
Data Model: It is an abstraction of some aspect of the real world or system. It helps to visualize
the business. Based on the model the data warehouse will be implemented.
Staging Area: It accepts data from different sources to cleanse the source data
Data dictionary: It is a collection of definitions and specifications for data categories and their
relationships. It is a database of data about data (metadata).
Conceptual Modeling: It describes the data requirements from a business pint of view without
technical details.
Logical Modeling: At this level, the data modeler attempts to describe the data in as much detail
as possible, without regard to how they will be physically implemented in the database. It is data
structure oriented.
Physical Modeling: At this level, the data modeler will specify how the logical data model will be
realized in the actual database schema.
Dimensional Modeling: A type of data modeling suited for data warehousing. It uses three basic
concepts: measures, facts, and dimensions.
Slowly changing dimension: A dimension whose attributes remain almost constant over time
requiring relatively minor alterations to represent the evolved state.
Measures: They are calculated values stored in a fact tables. They are numeric attributes of a
fact representing the performance or behavior of the business relative to dimensions.
Fact: It is a collection of related data items, consisting of measures and context data.
Additive facts: Here we are able to add facts along all the dimensions. These are discrete
numerical measures
Semi additive Facts: These facts are not additive along time dimension
Non additive Facts: The numeric measures that cannot be added across any dimensions
Dimension: It is a collection of members or units of the same type of information. It is one of the
perspectives that can be used to analyze the data.
e.g. - For a Sales database, the dimensions could include Product, Time, Store, and Promotion.
Fact tables: A table that is used to store business information. There are two types of fields in a
fact table:
1. The fields storing the foreign keys which connect each particular fact to the appropriate
value in each dimension.
2. The fields storing the individual facts (or measures) - such as number, amount, or price.
Dimension table: Tables used to store qualitative data about fact records like who, what, when,
where, why.
Conformed Dimension: A dimension that has exactly the same meaning and content when
being referred from different fact tables
Star Schema: In the star schema design, a single object (the fact table) sits in the middle and is
radially connected to other surrounding objects (dimension lookup tables) like a star. The
dimension tables are denormalised. The fact table primary key is the composite of the foreign
keys.
Snowflake Schema: Here normalized dimension tables surround the single fact tables.
Cube: It is the fundamental data structure for multidimensional analysis. A cube contains
dimensions, hierarchies, levels, and measures. Each individual point in a cube is referred to as a
cell.
Slicing and dicing: Slicing and dicing refers to the ability to combine and re-combine the
dimensions to see different slices of the information. Picture slicing a three-dimensional cube of
information, in order to see what values are contained in the middle layer. Slicing and dicing a
cube allows an end-user to do the same thing with multiple dimensions.
Cross tab: A process or function that combines or summarizes data from one or more sources
into a two dimensional format for analysis or reporting.
Drill up: Changing the view of the data to a higher level of aggregation.
Drill down: Changing the view of the data to a greater level of detail.
Dimensions can have one or more hierarchies. A Time dimension, for example, could have a
Calendar hierarchy and a Fiscal hierarchy. Hierarchies contain levels, which organize data into a
logical structure.
Moving between the levels of a hierarchy is called drilling up and drilling down.
e.g.- A time dimension might have a hierarchy that represents data at the month, quarter, year
levels
Aggregation: Information stored in a data warehouse in a summarized form.
Instead of recording the date and time each time a certain product is sold, the data warehouse
could store the quantity of the product sold each hour, each day, or each week.
To save storage space. Data warehouses can get large. The use of aggregations greatly reduces
the space needed to store data.
To improve the performance of business intelligence tools. When queries run faster they take up
less processing time and the users get their information back more quickly.
Foreign key: Primary key of some table that is used in the table we are referring.
Surrogate key: It is the system generated key for dimensions. A surrogate key has no meaning
by itself. These are integer keys, so less space is required.
Normalization: The process of organizing data in accordance with the rules of a relational
database.
In a completely de-normalized database the customer name and address information would be
stored every time a customer made a purchase.
In a normalized database each customer's name and address would be stored only once, in a
separate table. Every purchase record would have a reference to the customer table to indicate
which customer was involved.
OLTP (OnLine Transaction Processing): The use of computers to run the on-going operation
of a business.
"OLAP" is the most widely used term for multidimensional analysis software. The term "On-Line
Analytical Processing" was developed to distinguish data warehousing activities from "On-Line
Transaction Processing" - the use of computers to run the on-going operation of a business.
In its broadest usage the term "OLAP" is used as a synonym of "data warehousing". In a more
narrow usage, the term OLAP is used to refer to the tools used for Multidimensional Analysis
Relational On-Line Analytical Processing (ROLAP): OLAP that stores data and aggregations
in a relational database.
Cuboids: The relational tables used in ROLAP are called cuboid. Each cuboid represent a
particular view.
Multidimensional On-Line Analytical Processing(MOLAP): OLAP that stores data and
aggregations in a multidimensional database structures.
Hybrid OLAP (HOLAP): A combined use of Relational OLAP (ROLAP) and Multidimensional
OLAP (MOLAP).
In HOLAP, the source data is usually stored using a ROLAP strategy and aggregations are stored
using a MOLAP strategy. This combination usually results in the least amount of storage space
and the fastest cube processing.
Web based On-Line Analytical Processing(WOLAP): WOLAP uses a browser to deliver OLAP.
This is quite powerful. The delivery capability of the web coupled with the business intelligence
tools of OLAP will allow a broader number of business analysts to benefit from the software.
Data Mining: Techniques for finding patterns and trends in large data sets.
Operational data store(ODS): It may keep data for a smaller time period than warehouse and it
may feed the warehouse
Cleansing: It is the process of resolving inconsistencies and fixing the anomalies in source data
Loading: It is the insertion of the transformed data into a different data store.
Attribute: Attributes represent a single type of information in a dimension. For example, year is
an attribute in the Time dimension.
Grain: The fundamental atomic level of data to be represented in the fact table
e.g.-When considering time, day week, month, year are the grains
select
[Stores].[Region].Members on columns,
from SalesCube
where ([Measures].[Profit])
Ad Hoc Query: Any query that cannot be determined prior to the moment the query is issued
Change data capture: A process of capturing changes made to a production data source
Data administration: The processes and procedures by which the integrity and concurrency of
the data in the warehouse are maintained
Pivoting: A
transformation where each record in an input stream is converted to
manyrecords in the appropriate table in the data warehouse.