Sie sind auf Seite 1von 12

Prepared by

Chetan kumar.T Email

Praveen babu .U Email

Prepared For
Peddasettipalli(village),PRODDATUR.Kadapa dt.A.P


Organizations are today suffering from a malaise of data

overflow. The developments in the transaction processing technology has
given rise to a situation where the amount and rate of data capture is very high,
but the processing of this data into information that can be utilized for decision
making, is not developing at the same pace. Data warehousing and data
mining (both data & text) provide a technology that enables the decision-
maker in the corporate sector/govt. to process this huge amount of data in a
reasonable amount of time, to extract intelligence/knowledge in a near real
The data warehouse allows the storage of data in a format that
facilitates its access, but if the tools for deriving information and/or knowledge
and presenting them in a format that is useful for decision making are not
provided the whole rationale for the existence of the warehouse disappears.
Various technologies for extracting new insight from the data warehouse have
come up which we classify loosely as "Data Mining Techniques".
Our paper focuses on the need for information repositories and
discovery of knowledge and hence the overview of, the so hyped, Data
Warehousing and Data Mining.
“Knowledge [no more Information] is not only power, but also
has significant competitive advantage.”
Organizations have lately realized that just processing transactions
and/or information’s faster and more efficiently, no longer provides them with
a competitive advantage vis-à-vis their competitors for achieving business
excellence. Information technology (IT) tools that are oriented towards
knowledge processing can provide the edge that organizations need to survive
and thrive in the current era of fierce competition. The increasing competitive
pressures and the desire to leverage information technology techniques have
led many organizations to explore the benefits of new emerging technology –
viz. "Data Warehousing and Data Mining"
Introduction to Data Warehousing:
The age of industrial revolution has finally been completed and the world
has entered the age of information technology. The need for data warehouse
applications is one of the manifestations of this information technology age. It
has becoming more of necessity than an accessory for a progressive,
competitive, and focused organization.
A data warehouse supports business analysis and decision-making by
creating an enterprise-wide integrated database of summarized, historical
information. It integrates data from multiple, incompatible sources .By
transforming data into meaningful information, and a data warehouse allows
the manager to perform more substantive, accurate and consistent analysis.
The data warehouse is not the normal database, as we understand the term
“database”. Data warehouse refers to database that is maintained separately
from an organizations operational databases. A warehouse holds read-only
What is Data-Warehousing ?

A data warehouse is subject-oriented, integrated, time varying, non-
volatile collection of data in support of the management’s decision-making
process. The data stored in the warehouse are not just a copy of the data at the
sources. Instead, they can be thought of as a stored view or materialized view
of the data at the sources.
The most basic component in a data warehouse is a relational database.
Relational databases are designed to be able to efficiently insert new data and
locate existing data using a standardized query language. Underneath the
database is a maze of connections and transformations connecting the data
warehouse with other systems. Because data in a company is often created and
stored in functionally specific systems (e. g: payroll system), the data may
need to be replicated and moved between a data warehouse and these other
Functions of data warehouse:
The main function behind a data
warehouse is to get the enterprise-
wide data in a format that is most
useful to end-users, regardless of
their locations.
Data warehousing is used for:
• Increasing the speed and flexibility
of analysis.
• Providing a foundation for
enterprise-wide integration and
• Improving or re-inventing business
• Gaining a clear understanding of
customer behavior.
Architecture Of Data
Data Warehouses and their architectures
vary depending upon the specifics
of an organization's situation.
Three common architectures are:
• Data Warehouse Architecture
• Data Warehouse Architecture
(with a Staging Area) .
• Data Warehouse Architecture
(with a Staging Area & Data
Data Warehouse
Architecture(Basic) :
It shows a simple architecture for a
Data Warehouse. End users directly
access data derived from several source
systems through the data warehouse. The
metadata and raw data of a traditional
online transaction processing (OLTP)
system is present, as is an additional type
of data, summary data. Summaries are
very valuable in data warehouses because
they pre-compute long operations in
advance. A summary in Oracle are called
a materialized view.
Data Warehouse Architecture(with a staging area):
We can do this programmatically, although most data
warehouses use a staging area instead. A staging area simplifies building
summaries and general warehouse management.

Data Warehouse Architecture(with a staging area & Data

We may want to customize your warehouse's architecture for
different groups within our organization. We can do this by adding data
marts, which are systems designed for a particular line of business.
Processes with in a Data Warehouse:-
• Extract and load the data
• Clean and transform data into a form that can cope with large data
volumes and provide good query performance
• Backup and archive data
• Manage queries, and direct them to the appropriate data sources
Data Warehouses are not just large databases they are large, complex
environments that integrate many different technologies as such they require a
lot of maintenance and management.
Data Mining
Data base mining or Data mining (DM) (formally termed Knowledge
Discovery in Databases – KDD) is a process that aims to use existing data to
invent new facts and to uncover new relationships previously unknown even
to experts thoroughly familiar with the data. It is like extracting precious metal
(say gold etc.) and/or gems, hence the term “mining”,
It is based on filtration and assaying of mountain of data “ore” in order to get
“nuggets” of knowledge.
The data mining process is diagrammatically exemplified in Figure below
Datamining with
 The goal of a data warehouse is to
support decision making with data.
 Data mining can be used in
conjunction with a data warehouse
to help
with certain types of decisions.
 Data mining can be applied to
operational databases with
individual transactions
 To make data mining more
efficient, the data warehouse
should have an aggregated or
summarized collection of data.
 Data mining helps in extracting
meaningful new patterns that
cannot be found necessarily by
merely querying or processing
data or metadata in data
The knowledge discovery process
comprises four phases:
Data selection, Data about specific items
or categories of items, or from stores in a
specific region or area of the country, may
be selected.
Data cleansing process then may correct
invalid zip codes or eliminate records with
incorrect phone prefixes
Enrichment typically enhances the data
with additional sources of information.
Data transformation and encoding may
be done to reduce the amount of data.
Goals of Data Mining :
The Goals of data mining fall into the following classes :
 Prediction: Data mining can show how certain attributes within
the data will behave in the future.
 Identification: Data patterns can be used to identify the existence of an item, an
event, or an activity.
 Classification: Data mining can partition the data so that different classes or
categories can be identified based on combinations of parameters.
 Optimization: One eventual goal of data mining may be to optimize the use of
limited resources such as time, space, money, or material
and to maximize output variables such as sales or profits under a given set
of constraints.
Applications of Data Mining:-
Data Mining collects, stores and organizes data for use in areas such as
• Data Mining and customer relationship management(CRM)
software for solving business decision problems
• Privacy of data in Insurance companies and Government agencies
• Fraud detection in Telecommunications and stock exchanges
• Medical diagnosis to detect abnormal patterns
• Airline reservation to maximize seat utilization
• Intelligent agency to detect abnormal behavior
by it employees.
Comprehensive data warehouses that integrate operational data with customer,
supplier, and market information have resulted in an explosion of information.
Competition requires timely and sophisticated analysis on an integrated view of the data.
A new technological leap is needed to structure and prioritize information for specific
end-user problems. The data mining tools can make this leap. Data warehouse and data
mining plays an important role in storing data and sorting out the particular data. It has
become very easy for a user to get the information that he wants through this mining.
Quantifiable business benefits have been proven through the integration of data mining
with current information systems, and new products are on the horizon that will bring this
integration to an even wider audience of users.
1. Oracle8i warehousing by Michael Corey.
2. Data warehousing and data mining by Kurt Thearling.
3. Database management by Silberschtz, Korth.
5. Data mining by Arun. K. Pujari.
6. Data warehousing by Sunitha S, IIT Bombay