Sie sind auf Seite 1von 11



Presented by:-
ROLL NO: 05C71A0547 ROLL NO: 05C71A1218
CONTACT NO: 9966952101 CONTACT NO: 9885522506












We live in the age of information. Data is the most valuable resource of an

enterprise. In today’s competitive global business environment, understanding and
managing enterprise wide information is crucial for making timely decisions and
responding to changing business conditions. Many companies are realizing a business
advantage by leveraging one of their key assets – business Data. There is a tremendous
amount of data generated by day-to-day business operational applications. In addition
there is valuable data available from external sources such as market research
organizations, independent surveys and quality testing labs. Studies indicate that the
amount of data in a given organization doubles every 5 years.
Data warehousing has emerged as an increasingly popular and powerful concept
of applying information technology to turn these huge islands of data into meaningful
information for better business. Data mining, the extraction of hidden predictive
information from large databases is a powerful new technology with great potential to
help companies focus on the most important information in their data warehouses. Data
mining tools predict future trends and behaviors, allowing businesses to make proactive,
knowledge-driven decisions.
This paper describes the practicalities and the constraints in Data mining and Data
warehousing and its advancements from the earlier technologie


Data Warehousing o Extract archived

operational data
• A data warehouse can be defined
o Overcome inconsistencies
as any centralized data repository
between different legacy
which can be queried for
data formats
business benefit
• Warehousing makes it possible to
o Integrate data throughout warehousing. An implementation of data
an enterprise, regardless mining in an organization will serve as a
of location, format, or guide to uncover inherent trends and
communication tendencies in historical information, as
requirements well as allow for statistical predictions,
o Incorporate additional or groupings and
expert information Classification of data.
Typical data warehousing
implementations in organizations will
Data Mining allow users to ask and answer questions
such as “How many sales were made, by
Data mining is not an territory, by sales person between the
“intelligence” tool or framework, months of May and June in 1999?” Data
typically drawn from an enterprise data mining will allow business decision
warehouse is used to analyze and makers to ask and answer questions,
uncover information about past such as “Who is my core customer that
performance on an aggregate level. Data purchases a particular product we sell?”
warehousing and business intelligence or “Geographically, how well would a
provide a method for users to anticipate line of products sell in a particular
future trends from analyzing past region and who would purchase them,
patterns in organizational data. Data given the sale of similar products in that
mining is more intuitive, allowing for region.
increased insight beyond data


Generally, data mining information that can be used to increase

(sometimes called data or knowledge revenue, cuts costs, or both. Data mining
discovery) is the process of analyzing software is one of a number of analytical
data from different perspectives and tools for analyzing data. It allows users
summarizing it into useful information - to analyze data from many different
dimensions or angles, categorize it, and not. Companies have used powerful
summarize the relationships identified. computers to sift through volumes of
Technically, data mining is the process supermarket scanner data and analyze
of finding correlations or patterns among market research reports for years.
dozens of fields in large relational However, continuous innovations in
databases. computer processing power, disk
storage, and statistical software are
Although data mining is a dramatically increasing the accuracy of
relatively new term, the technology is analysis while driving down the cost.


Dramatic advances in data capture, allowing users to access this data freely.
processing power, data transmission, and The data analysis software is what
storage capabilities are enabling supports data mining.
organizations to integrate their various
databases into data warehouses. Data According to Bill Inman, author
warehousing is defined as a process of of Building the Data Warehouse and the
centralized data management and guru who is widely considered to be the
retrieval. Data warehousing, like data originator of the data warehousing
mining, is a relatively new term although concept, there are generally four
the concept itself has been around for characteristics that describe a data
years. Data warehousing represents an warehouse:
ideal vision of maintaining a central
• Subject-oriented: data are
repository of all organizational data.
organized according to subject
Centralization of data is needed to
instead of application e.g. an
maximize user access and analysis.
insurance company using a data
Dramatic technological advances are
warehouse would organize their
making this vision a reality for many
data by customer, premium, and
companies. And, equally dramatic
claim, instead of by different
advances in data analysis software are
products (auto, life, etc.). The from the operational environment
data organized by subject contain into the data warehouse, they
only the information necessary assume a consistent coding
for decision support processing. convention e.g. gender data is
• Integrated: When data resides in transformed to "m" and "f".
many separate applications in the • Time-variant: The data
operational environment, warehouse contains a place for
encoding of data is often storing data that are five to 10
inconsistent. For instance, in one years old, or older, to be used for
• application, gender might be comparisons, trends, and
coded as "m" and "f" in another forecasting. These data are not
by 0 and 1. When data are moved updated.

An Overview of Data Mining Techniques

This overview provides a description 2) Next Generation Techniques such

of some of the most common data as trees, networks and rules.
mining algorithms in use today. We have
Each section will describe a number
broken the discussion into two sections,
of data mining algorithms at a high level,
each with a specific theme:
focusing on the "big picture" so that the
1) Classical Techniques such as reader will be able to understand how
statistics, neighborhoods and each algorithm fits into the landscape of
clustering, and data mining techniques.


Extracting meaningful that might otherwise be overlooked

information from numerous is called "data mining." Assembling
databases and cross-referencing it to the information in one place is called
find patterns, trends and correlations "data warehousing."
 All the information is stored transformed and the useful
in Information repositories. data is sent through Data
 Data warehouse takes the mining.
cleaned and integrated data.  The data, which is sent
 The data taken by Data through data mining is
warehouse is selected and evaluated and presented.


Data Warehousing • Retrieve data - from a variety of

heterogeneous operational
o Data is transformed and
• Insulate data - i.e. the current
delivered to the data
operational information
warehouse/store based on
o Preserves the security and
a selected model (or
integrity of mission-
mapping definition)
critical OLTP applications
o Metadata - information
o Gives access to the
describing the model and
broadest possible base of
definition of the source
data elements
• Data cleansing - removal of
certain aspects of operational
data, such as low-level • Enhances end-user access to a
transaction information, which wide variety of data.
slow down the query times. • Business decision makers can
• Transfer - processed data obtain various kinds of trend
transferred to the data reports e.g. the item with the
warehouse, a large database on a most sales in a particular area /
high performance box. country for the last two years.
A data warehouse can be a
Data Mining
significant enabler of commercial
business applications, most notably
• Medicine - drug side effects,
Customer relationship Management
hospital cost analysis, genetic
sequence analysis, prediction etc.
• Finance - stock market
prediction, credit assessment,
fraud detection etc.
• Marketing/sales - product
analysis, buying patterns, sales
prediction, target mailing, Data mining systems rely on

identifying `unusual behavior' databases to supply the raw data for

etc. input and this raises problems in that

• Knowledge Acquisition databases tend be dynamic, incomplete,

• Scientific discovery - noisy, and large. Other problems arise as

superconductivity research, etc. a result of the adequacy and relevance of

• Engineering - automotive the information stored.

diagnostic expert systems, fault

detection etc.
Limited Information
A database is often designed for
Missing data can be treated by discovery
purposes different from data mining and
systems in a number of ways such as;
sometimes the properties or attributes
that would simplify the learning task are • Simply disregard missing values
not present nor can they be requested • Omit the corresponding records
from the real world. Inconclusive data • Infer missing values from known
causes problems because if some values
attributes essential to knowledge about • Treat missing data as a special
the application domain are not present in value to be included additionally
the data it may be impossible to discover in the attribute domain
significant knowledge about a given • Or average over the missing
domain. For example cannot diagnose values using Bayesian
malaria from a patient database if that techniques.
database does not contain the red blood
cell count of the patients.


The future of data mining lies in small start-ups that have been ruthlessly

predictive analytics. The technology culled from the herd by a perfect storm

innovations in data mining since 2000 of bad economic news. Nevertheless, the

have been truly Darwinian and show emerging market for predictive analytics

promise of consolidating and stabilizing has been sustained by professional

around predictive analytics. Variations, services, service bureaus (rent a

novelties and new candidate features recommendation) and profitable

have been expressed in a proliferation of applications in verticals such as retail,

consumer finance, telecommunications, the customer, segment and predict

travel and leisure, and related analytic customer behavior and forecast product

applications. Predictive analytics have demand and related market dynamics.

successfully proliferated into Be realistic about the required complex

applications to support customer mixture of business acumen, statistical

recommendations, customer value and processing and information technology

churn management, campaign support as well as the fragility of the

optimization, and fraud detection. On the resulting predictive model; but make no

product side, success stories in demand assumptions about the limits of

planning; just in time inventory and predictive analytics. Breakthroughs often

market basket optimization are a staple occur in the application of the tools and

of predictive analytics. Predictive methods to new commercial

analytics should be used to get to know opportunities


Comprehensive data warehouses and prioritize information for specific

that integrate operational data with end-user problems. The data mining
customer, supplier, and market tools can make this leap. Quantifiable
information have resulted in an business benefits have been proven
explosion of information. Competition through the integration of data mining
requires timely and sophisticated with current information systems, and
analysis on an integrated view of the new products are on the horizon that will
data. However, there is a growing gap bring this integration to an even wider
between more powerful storage and audience of users.
retrieval systems and the users’ ability to
effectively analyze and act on the
information they contain. Both relational
• Data mining has a lot of potential
and OLAP technologies have
• Diversity in the field of
tremendous capabilities for navigating
massive data warehouses, but brute force
• Estimated market for data mining
navigation of data is not enough. A new
is $500 million
technological leap is needed to structure


1.Books Referred: 2. Internet Sites Availed:

a. Data Mining: concepts and a.

techniques-Jiawei Han b.
b. Data Mining Techniques- c. www.the-data-
Arun k. Pujari.
c. Decision Support and Data
Warehouse systems-Efrem G.Mallach