Sie sind auf Seite 1von 6

DATAWAREHOUSING

Different people have different definitions for a data warehouse. The most popular definition came
from Bill Inmon, who provided the following:

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection


of data in support of management's decision making process.

Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For
example, "sales" can be a particular subject.

Integrated: A data warehouse integrates data from multiple data sources. For example, source A
and source B may have different ways of identifying a product, but in a data warehouse, there will
be only a single way of identifying a product.

Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data
from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts
with a transactions system, where often only the most recent data is kept. For example, a
transaction system may hold the most recent address of a customer, where a data warehouse
can hold all addresses associated with a customer.

Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data
warehouse should never be altered.

Ralph Kimball provided a more concise definition of a data warehouse:

A data warehouse is a copy of transaction data specifically structured for query and
analysis.

.
OLAP (On-Line Analytical Processing): A method by which multidimensional analysis
occurs.
Multidimensional Analysis: The ability to manipulate information by a variety of
relevant categories
or “dimensions” to facilitate analysis and understanding of the underlying data. It is also
sometimes
referred to as “drilling-down”, “drilling-across” and “slicing and dicing”

‘Data warehouse’ and ‘OLAP’ are terms which are often used interchangeably. Actually
they refer to two different components of a decision support system. While data in a data
warehouse is composed of the historical data of the organization stored for end user
analysis, OLAP is a technology that enables a data warehouse to be used effectively for
online analysis using complex analytical queries. The differences between OLAP and
data warehouse is tabulated below for ease of understanding :

Data Warehouse

Data from different data sources is stored in a relational database for end use analysis

Data from different data sources is stored in a relational database for end use analysis Data is
organized in summarized, aggregated, subject oriented, non volatile patterns.

Data is a data warehouse is consolidated, flexible collection of data Supports analysis of data
but does not support online analysis of data.

Online Analytical Processing

A tool to evaluate and analyze the data in the data warehouse using analytical queries.

A tool which helps organize data in the data warehouse using multidimensional models of data
aggregation and summarization.

Supports the data analyst in real time and enables online analysis of data with speed and
flexibility.

Data Mining: What is Data Mining?


Overview

Generally, data mining (sometimes called data or knowledge discovery) is the process of
analyzing data from different perspectives and summarizing it into useful information -
information that can be used to increase revenue, cuts costs, or both. Data mining
software is one of a number of analytical tools for analyzing data. It allows users to
analyze data from many different dimensions or angles, categorize it, and summarize the
relationships identified. Technically, data mining is the process of finding correlations or
patterns among dozens of fields in large relational databases.

Continuous Innovation

Although data mining is a relatively new term, the technology is not. Companies have
used powerful computers to sift through volumes of supermarket scanner data and
analyze market research reports for years. However, continuous innovations in computer
processing power, disk storage, and statistical software are dramatically increasing the
accuracy of analysis while driving down the cost.

Example

For example, one Midwest grocery chain used the data mining capacity of Oracle
software to analyze local buying patterns. They discovered that when men bought diapers
on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that
these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays,
however, they only bought a few items. The retailer concluded that they purchased the
beer to have it available for the upcoming weekend. The grocery chain could use this
newly discovered information in various ways to increase revenue. For example, they
could move the beer display closer to the diaper display. And, they could make sure beer
and diapers were sold at full price on Thursdays.

Distributed database management system


A distributed database is a collection of multiple, logically interrelated databases distributed
over a computer network. Sometimes "distributed database system" is used to refer jointly to
the distributed database and the distributed DBMS.

Overview
Distributed database management systems is a software for managing databases stored on
multiple computers in a network. A distributed database is a set of databases stored on
multiple computers that typically appears to applications on a single database. Consequently,
an application can simultaneously access and modify the data in several databases in a
network

RELATIONAL ALGEBRA IN DBMS

Consider two relations R and S.

• UNION of R and S
the union of two relations is a relation that includes all the tuples that are
either in R or in S or in both R and S. Duplicate tuples are eliminated.
• INTERSECTION of R and S
the intersection of R and S is a relation that includes all tuples that are
both in R and S.

• DIFFERENCE of R and S
the difference of R and S is the relation that contains all the tuples that are
in R but that are not in S.

UNION Example

Figure : UNION
INTERSECTION Example

Figure : Intersection

DIFFERENCE Example

Figure : DIFFERENCE

CARTESIAN PRODUCT
The Cartesian Product is also an operator which works on two sets. It is sometimes called the CROSS
PRODUCT or CROSS JOIN.
It combines the tuples of one relation with all the tuples of the other relation.
CARTESIAN PRODUCT example

Das könnte Ihnen auch gefallen