Sie sind auf Seite 1von 10

1| Page

• Index page
page 2
• Introduction to Data warehousing

page3

• Graphical representation of data warehousing


page 4
• History of Data warehousing
page5
• Uses and advantages of data warehousing
page5
• Disadvantages of data warehousing
page6
• Meta data
page6
• Advantages of meta data
page6
• Data mining
page7

2| Page
• Function of Data mining
page7
• Graphical representation of data mining
page8
• Advantages of data mining
page9
• Disadvantages of data mining
page9
• Complete process of warehousing, meta data,
data mining
page10

DATA WAREHOUSE
What is data warehouse?

Data warehouses are computer based information systems that are home for "secondhand" data
that originated from either another application or from an external system or source.

3| Page
Fig 1 Data warehousing analysis

HISTORY
TO the late 1980s, IBM researchers Barry Devlin and Paul Murphy developed the "business data
warehouse". The data warehousing concept was intended to provide an architectural model for
the flow of data from operational systems to decision support environments. The concept
attempted to address the various problems associated with this flow - mainly, the high costs
associated with it. In the absence of a data warehousing architecture, an enormous amount of
redundancy was required to support multiple decision support environments. In larger
corporations it was typical for multiple decision support environments to operate independently.

4| Page
As a result, separate computer databases began to be built that were specifically designed to
support management information and analysis purposes. These data warehouses were able to
bring in data from a range of different data sources, such as mainframe computers,
minicomputers, as well as personal computers and office automation software such as
spreadsheet, and integrate this information in a single place. This capability, coupled with user-
friendly reporting tools and freedom from operational impacts, has led to a growth of this type of
computer system.
Data warehouses often hold large amounts of information which are sometimes subdivided into
smaller logical units called dependent data marts. Dependent Data marts allow for easier
reporting by keeping relevant data together in one location.

As technology improved (lower cost for more performance) and user requirements increased
(faster data load cycle times and more features), data warehouses have evolved through several
fundamental stages:
• Offline Operational Databases - Data warehouses in this initial stage are developed by
simply copying the database of an operational system to an off-line server where the
processing load of reporting does not impact on the operational system's performance.
• Offline Data Warehouse - Data warehouses in this stage of evolution are updated on a
regular time cycle (usually daily, weekly or monthly) from the operational systems and
the data is stored in an integrated reporting-oriented data structure.
• Real Time Data Warehouse - Data warehouses at this stage are updated on a transaction
or event basis, every time an operational system performs a transaction (e.g. an order or a
delivery or a booking etc.)
• Integrated Data Warehouse - Data warehouses at this stage are used to generate activity
or transactions that are passed back into the operational systems for use in the daily
activity of the organization.

USES OF DATA WAREHOUSE:


• A data warehouse provides a common data model for all data of interest regardless of the
data's source. This makes it easier to report and analyze information than it would be if
multiple data models were used to retrieve information such as sales invoices, order
receipts, general ledger charges, etc.
• Prior to loading data into the data warehouse, inconsistencies are identified and resolved.
This greatly simplifies reporting and analysis.
• Information in the data warehouse is under the control of data warehouse users so that,
even if the source system data is purged over time, the information in the warehouse can
be stored safely for extended periods of time.

5| Page
• Because they are separate from operational systems, data warehouses provide retrieval of
data without slowing down operational systems.
• Data warehouses can work in conjunction with and, hence, enhance the value of
operational business applications, notably customer relationship management (CRM)
systems.
• Data warehouses facilitate decision support system applications such as trend reports
(e.g., the items with the most sales in a particular area within the last two years),
exception reports, and reports that show actual performance versus goals

DISADVANTAGES
• Because data must be extracted, transformed and loaded into the warehouse, there is an
element of latency in data warehouse data.
• Over their life, data warehouses can have high costs. Maintenance costs are high.
• Data warehouses can get outdated relatively quickly. There is a cost of delivering
suboptimal information to the organization.
• Data owners lose control over their data, raising ownership (responsibility and
accountability), security and privacy issues
• Limited flexibility of use and types of users - requires multiple separate data marts for
multiple uses and types of users
• Typically, data is static and dated

• Typically, no data drill-down capabilities


• Difficult to accommodate changes in data types and ranges, data source schema, indexes
and queries
• Typically, cannot actively monitor changes in data

Meta data
Meta data can be defined as a structured description of the content, quality, condition or other
characteristics of data. Metadata needs to accompany data, otherwise the data being transmitted
or communicated cannot be understood. Metadata is often called ‘data about data’. More
precisely, it is the underlying definition or structured description of the content, quality,
condition or other characteristics of data.

6| Page
It is well accepted in the world of statistics and large databases that metadata leads to better data.
This is because they enable all people collecting, using and exchanging data to share the same
understanding of its meaning and representation.
There are two basic types of metadata – technical metadata and business metadata.
Technical metadata consists of those technical descriptions of data such as tables,
attributes, indexes and so forth. These technical types of metadata are found in data
dictionaries, directories, and repositories. The world of business metadata is made of non
technical definitions, formulae, descriptions, and so forth. Business metadata relies on
context in order to give meaning and shades of meaning to business metadata. In addition
valuable reference tables are stored in master data management facilities.
The value of metadata is often not apparent.

Advantages
• Meta data is most useful when integrated and end-to-end, promoting efficient data
warehouse development and maintenance.
• The simplest and most practical use of meta data is to provide business descriptions of
the data to BI tools and analytic applications.
• Metadata is used to facilitate the understanding, usage, and management of data, both by
human and computers.
• Metadata is used to speed up and enrich searching for resources. In general, search
queries using metadata can save users from performing more complex filter operations
manually.
• Metadata provide additional information to users of the data it describes. This
information may be descriptive or algorithmic.
• Metadata helps to bridge the semantic gap. By telling a computer how data items are
related and how these relations can be evaluated automatically, it becomes possible to
process even more complex filter and search operations.

Data mining
Data mining (sometimes called data or knowledge discovery) is the process of analyzing data
from different perspectives and summarizing it into useful information - information that can be
used to increase revenue, cuts costs, or both. Data mining software is one of a number of
analytical tools for analyzing data. It allows users to analyze data from many different
dimensions or angles, categorize it, and summarize the relationships identified. Technically, data
mining is the process of finding correlations or patterns among dozens of fields in large
relational databases.
Data mining consists of five major elements:
• Extract, transform, and load transaction data onto the data warehouse system.

7| Page
• Store and manage the data in a multidimensional database system.
• Provide data access to business analysts and information technology professionals.
• Analyze the data by application software.
• Present the data in a useful format, such as a graph or table.

Functions of data mining


Data mining is primarily used today by retail, financial, communication, and marketing
organizations with a strong consumer focus.
It enables these companies to determine relationships among "internal" factors such as price,
product positioning, or staff skills, and "external" factors such as economic indicators,
competition, and customer demographics.
It enables them to determine the impact on sales, customer satisfaction, and corporate profits.

8| Page
d atm
(Data mining fig.2)

Advantages
Marking/Retailing: Data mining can aid direct marketers by providing them with
useful and accurate trends about their customers’ purchasing behavior.
Banking/Crediting: Data mining can assist financial institutions in areas such as
credit reporting and loan information.
Law enforcement: Data mining can aid law enforcers in identifying criminal
suspects as well as apprehending these criminals by examining trends in location,
crime type, habit, and other patterns of behaviors.
Researchers: Data mining can assist researchers by speeding up their data analyzing
process; thus, allowing them more time to work on other projects.

Disadvantages
Privacy Issues: For example, according to Washing Post, in 1998, CVS had sold their
patient’s prescription purchases to a different company
Security issues: Although companies have a lot of personal information about us
available online, they do not have sufficient security systems in place to protect that
information.
Misuse of information: Some of the company will answer your phone based on your
purchase history. If you have spent a lot of money or buying
a lot of product from one company, your call will be answered really soon. So you should
not think that your call is really being answer in the order in which it was receive.

9| Page
• Whole picture of warehousing,meta daxta,data mining

H isto ri
cal
• O ra
data
N atu re o f
d ata
10 | P a g e
• Cle
thanks