Sie sind auf Seite 1von 8

Data Mining & Data Warehousing 1

DATA MINING
&
DATA WAREHOUSING

Submitted By
-K.A.Sha Rajesh Rajan,

-G.Sivakumar,
(DON BOSCO ARTS & SCIENCE),
DHARMAPURI-636 809.

Abstract

In this topic we surveyed the important discipline of data mining, which uses database

technology to discover additional knowledge or patterns inn the data. We gave an illustrative example

of knowledge discovery in database, which had a wider scope, the data mining among the various

techniques, we focused on the details of association rule mining classification and clustering. These

data warehouses provide storage, functionality, and responsiveness to queries beyond the capabilities

of transaction oriented databases.

A data warehouse is a collection of related data and a database system as a database and

database software together. In data warehousing provides data access to the enterprise.
Data Mining & Data Warehousing 2
CONTENTS:
- Overview
- Goals
- Scope
- Fundamental Information Systems
- Architecture
- Working
- Profitable Applications
- Future Developments
- Conclusion

An Introduction to Data Mining: become part of a data warehouse.

The term data mining refers loosely to the


process of semi automatically analyzing large
databases to find useful patterns. The data mining
deals with “Knowledge discovery in databases”. It
is an information extraction activity whose goal is
to discover hidden facts contained in databases.
Typical applications include market segmentation,
customer profiling, fraud detection, evaluation of
retail promotions, and credit risk analysis

An Introduction to Data warehousing:

A data warehouse is a collection of data


gathered and organized so that it can easily by
analyzed, extracted, synthesized, and otherwise be
used for the purposes of further understanding the
data. It may be contrasted

with data that is gathered to meet immediate


business objectives such as order and payment
transactions, although this data would also usually
Data Mining & Data Warehousing 3
• To mix it with information from other,
often external, sources of data.
Overview:
The Scope of Data Warehousing:
Data mining, the extraction of hidden
predictive information from large databases, is a
powerful new technology with great potential to
help companies focus on the most important
information in their data warehouses.

Data Warehousing has grown out of the


repeated attempts on the part of various
researchers and organizations to provide their
organizations flexible, effective and efficient
means of getting at the sets of data that have come
to represent one of the organization's most critical
and valuable assets.

Goal of Data Mining:


• Simplification and automation of the To use data models and/or server
overall statistical process, from data technologies that speed up querying and reporting
source(s) to model application and that are not appropriate for transaction
• Replace statistician ð Better models, less processing
grunge work
To provide a repository of transaction
• Statistical expertise required to compare
processing system data that contains data from a
different techniques
longer span of time than can efficiently be held in
• Build intelligence into the software.
a transaction processing system and/or to be able
Goal of Data Warehousing:
to generate reports "as was" as of a previous point
in time.
The goal of data warehousing is :
• Free the information that is locked up in To prevent persons who only need to
the operational databases. query and report transaction processing
system data from having any access
Data Mining & Data Warehousing 4
whatsoever to transaction processing data.
system databases and logic used to
maintain those databases.

An Architecture for Data Mining:

The ideal starting point is a data warehouse


containing a combination of internal data tracking
all customer contact coupled with external market
data about competitor activity.

An Architecture for Data Warehousing:

A Data Warehouse Architecture (DWA) is a


way of representing the overall structure of data,
To best apply the advanced techniques, communication, processing and presentation that
they must be fully integrated with a data exists for end-user computing within the
warehouse as well as flexible interactive business enterprise. The architecture is made up of a
analysis tools. Many data mining tools currently number of interconnected parts:
operate outside of the warehouse, requiring extra
steps for extracting, importing, and analyzing the Operational Database / External
Database Layer:
Data Mining & Data Warehousing 5
Operational systems process data to support The Data Access Layer of the Data
critical operational needs. In order to do that, Warehouse Architecture is involved with allowing
operational databases have been historically the Information Access Layer to talk to the
created to provide an efficient processing Operational Layer.It is responsible for interfacing
structure for a relatively small number of well- between Information Access tools and
defined business transactions. Operational Databases.

Information Access Layer Data Directory (Metadata) Layer:

The Information Access layer of the Data In order to provide for universal data

Warehouse Architecture is the layer that the end- access, it is absolutely necessary to maintain some

user deals with directly. In particular, it represents form of data directory or repository of meta-data

the tools that the end-user normally uses day to information. Meta-data is the data about data

day, e.g., Excel, Lotus 1-2-3, Focus, Access, SAS, within the enterprise.

etc. This layer also includes the hardware and


In order to have a fully functional
software involved in displaying and printing
warehouse, it is necessary to have a variety of
reports, spreadsheets, graphs and charts for
meta-data available, data about the end-user views
analysis and Presentation.
of data and data about the operational databases.
Ideally, end-users should be able to access data
from the data warehouse (or from the operational
databases) without having to know where that
data resides or the form in which it is stored.

Process Management Layer

The Process Management Layer is involved


in scheduling the various tasks that must be
accomplished to build and maintain the data
warehouse and data directory information. The
Process Management Layer can be thought of as
the scheduler or the high-level job control for the
many processes (procedures) that must occur to
Data Access Layer: keep the Data Warehouse up-to-date.
Data Mining & Data Warehousing 6
Application Messaging Layer: T information access data from operational and/or
external databases.Data Staging may also involve
The Application Message Layer has to do data quality analysis programs and filters that
with transporting information around the identify patterns and data structures within
enterprise computing network. Application existing operational data.
Messaging is also referred to as "middleware", but
it can involve more that just networking protocols. Central Data Warehouses
Application Messaging can also be used to collect
The central data warehouse is a single physical
transactions or messages and deliver them to a
database that contains all of the data for a specific
certain location at a certain time. Application
functional area, department, division, or
Messaging in the transport system underlying the
enterprise. Central Data Warehouses are often
Data Warehouse.
selected where there is a common need for
informational data and there are large numbers of
end-users already connected to a central computer
or network.

Data Warehouse (Physical) Layer:

The (core) Data Warehouse is where the


actual data used primarily for informational uses
occurs. In a Physical Data Warehouse, copies, in
some cases many copies, of operational and or
external data are actually stored in a form that is
easy to access and is highly flexible.

Data Staging Layer

The final component of the Data


Warehouse Architecture is Data Staging. Data
Staging is also called copy management or
replication management, but in fact, it includes all
of the processes necessary to select, edit,
summarize, combine and load data warehouse and
Data Mining & Data Warehousing 7
Distributed Data Warehouses: model for a telecommunications company might
be:98% of the customers who make more than
$60,000/year spend more than $80/month on long
distanceThis model could then be applied to the
prospect data to try to tell something about the
proprietary information that this
telecommunications company does not currently
have access to. With this model in hand new
customers can be selectively targeted. Test
marketing is an excellent source of data for this
kind of modeling. Table 2 shows another common
scenario for building models: predict what is
going to happen in the future.

Profitable Application:

One of the Important and Current


Application is CREDIT CARD FRAUD
DETECTION & RECOVERY:Online credit card
Distributed Data Warehouses are just what
fraud against merchants can be broken out into
their name implies. They are data warehouses in
three major categories:
which the certain components of the data
warehouse are distributed across a number of
• Organized Fraud
different physical databases.
• Opportunistic Fraud
• Cardholder Fraud
How Data Mining Works:
Future Developments:Data Warehousing is such a
The technique that is used to perform these
new field that it is difficult to estimate what new
feats in data mining is called modeling.As the
developments are likely to most affect it. Clearly,
marketing director you have access to a lot of
the development of parallel DB servers with
information about all of your customers: their age,
improved query engines is likely to be one of the
sex, credit history and long distance calling usage
most important. Parallel servers will make it
and you also have a lot of information about your
possible to access huge data bases in much less
prospective customers.For instance, a simple
time.Another new technology is datawarehouses
Data Mining & Data Warehousing 8
that allow for the mixing of traditional numbers,
text and multi-media. The availability of
improved tools for data visualization (business
intelligence) will allow users to see things that
could never be seen before

Conclusion

 The value of warehousing and mining in effective decision making based on concrete evidence from
old data
 Challenges of heterogeneity and scale in warehouse construction and maintenance
 Grades of data analysis tools: straight querying, reporting tools, multidimensional analysis and
mining.

Das könnte Ihnen auch gefallen