Sie sind auf Seite 1von 11

DATA MINING AND DATA WAREHOUSING

PRESENTED BY:

R.V. Ravi Kiran P. Nagesh


Computer science engineering Computer science engineering (3/4 B.tech)
Email id:u2kirancse@gmail.com Email id:nagesh_1866@yahoo.co.in
Ph no:9440966469 Ph no: 9701706236

S.k.chaitanya
Computer science engineering (3/4 B.tech)
Email id:kirsch.srigiriraju@gmail.com
Ph no: 9491893712

Gayathri Vidya Parishad College Of Engineering

Visakhapatnam.
it has more advantages over the other schemas.

ABSTRACT: Snowflake schemas normalize dimensions to


eliminate redundancy.
Data Mining is a concept that is
Both Data Mining and Data Warehousing
taking off in the commercial sector as a means of
are important in the present competitive market
finding useful information out of gigabytes of
world with others. More applications like
data. While products for the commercial
Customer Retention, Marketing, Risk
environment are starting to become available,
Assessment, Fraud detection and others.
tools for a scientific environment are much rarer
(or even non-existent). Yet scientists have long
had to search through reams of printouts and
rooms full of tapes to find the gems that make up
scientific discovery.

This paper will explore some of the ad hoc


methods generally used for Data Mining in the
scientific community, including such things
as scientific visualization, and outline how
some of the more recently developed
products used in the commercial environment
can be adapted to scientific Data Mining

Data Warehousing is a repository of FIG.DATA ANALYSIS

data gathered from multiple sources stored INTRODUCTION


under a unified schema at a single site. In this
In today’s fiercely competitive market
paper, we will discuss about the Data
place, companies have an insatiable need for
Warehouse design using star and snowflake
information. Customer data, financial data and
schemas. We are frequently using Star schema,
Internet-click stream data is a powerful asset
provided it can be integrated and utilized to DEF: A Database is a collection of non-
enhance customer experiences. redundant data which is sharable between
different applications.

WHAT IS DATAMINING?

With the current trends in


centralization of an organization’s data in
large databases, particularly in a commercial
environment, the process of extracting useful
information has become more formalized and
the term Data Mining has been coined for it. In
one of the first papers on commercial Data
Mining, Evangelos Simoudis of IBM defined
it as:

“The process of extracting


previously unknown, comprehensible and
The ability to access meaningful
actionable information from large databases and
data, moving and sharing of data throughout an
using it to make crucial business decisions”
organization between departments, officers and
business partners in a timely efficient manner This definition has a definite business favor and
through the use of familiar query and analytical much of IBM's development of Data Mining
tools are critical. has been in this direction. In practice, Data
Mining is a process which can take on different
approaches depending on the type of data
involved and the objectives desired. As this is
still very much an evolving discipline, much
work is being undertaken to determine standard
processes for the varied environments. Further,
as the context in which the data is gathered is
often an important component, this must be
factored into any analysis.

FIG. HOW DATA IS SHARED


approach to traditional approaches analysis, by
using a combination of localized data analysis,
together with a “global data model”.

FIG.DATAMINING PROCESS

Data Mining is defined as “the


non-trivial extraction of implicit, previously
unknown, potentially useful and understandable
knowledge from data”. Data Mining is the
process of finding correlations or patterns FIG.DISTRIBUTED DATAMINING
among dozens of fields in large relational
databases.
SPATIAL AND GEOGRAPHIC DATA
Latest Trends in Technologies and Methods
MINING:
LATEST TRENDS IN
“The extraction of implicit knowledge,
TECHNOLOGIES AND METHODS:
spatial relationships or other patterns not
There are many number of Data Mining explicitly stored in spatial databases.” is known
trends is in terms of technologies as spatial Data Mining.

and methodologies which are currently being The applications are useful in remote
developed and rehearsal. The trends identified sensing, medical, navigation, and related uses.
include

DISTRIBUTED/COLLECTIVE
DATAMINING:

The information located in different places, in


different physical locations is generally known
as distributed Data Mining. Distributed Data
Mining (DDM) is used to offer a different
FI of sequence of data. Sequential pattern mining
focuses on the identification of sequences.

FIG.SEQUENTIAL DATAMINING

HYPERTEXT&HYPERMEDIA
DATAMINING:

Hypertext and Hypermedia Data Mining


can be characterized as mining data which

FIG.SPATIAL DATAMINING

TIME SERIES/SEQUENCE DATAMINING:

Another important area in Data Mining


centers on the mining of time series and
sequence-based data. This involves the mining
includes text, hyperlinks and text markups. ♣ Fraud detection in Telecommunications
and stock exchanges
♣ Medical diagnosis to detect abnormal
patterns
♣ Airline reservation to maximize seat
utilization

FIG.DATAMINING
PHENOMENAL DATAMINING:

Phenomenal Data mining focuses on the


relationships between data and the phenomenon FIG.APPLICATIONS OF DATAMINING
which are inferred from the data is not went
well in data ware project.

WHAT IS DATA WAREHOUSING?


APPLICATIONS OF
A single, complete and
DATAMINING:
consistent store of data obtained from a variety
Data Mining collects, stores and organizes data of different sources made available to end users
for use in areas such as in what they can understand and use in a
business context.
♣ Data Mining and customer relationship
management (CRM) software for A data warehouse is a subject-oriented,

solving business decision problems integrated, time-variant and non-volatile

♣ Privacy of data in Insurance companies collection of data in support of management’s

and Government agencies decision making process


FIG.DATAWAREHOUSING
SUBJECT ORIENTED:

The data in the warehouse is


defined in business terms and is grouped under
business oriented subject headings such as
customers, products, sales analysis report and
marketing campaigns achieved through data
modeling.

FIG.DATA WAREHOUSE INTEGRATED:

A Data Warehouse is a relational Data Warehouses must put


database that is designed for query and data from disparate sources into a consistent
analysis rather than for transaction format. They must resolve problems such as
processing. It contains historical data derived naming conflicts and inconsistencies among
from transaction data. Data Warehouses units of measure. When they achieve this, they
characteristics, are said to be integrated.

♣ Subject oriented

♣ Integrated NON-VOLATILE:

♣ Non-volatile Once loaded into the


Data Warehouse, the data is not updated. Acts
♣ Time-variant as stable resource for consistent reporting and
comparative analysis
TIME-VARIANT: ♣ Data Warehouse Architecture (with a
Staging Area and Data Marts)
All data in the Data
Warehouse is time stamped at time of entry into
the warehouse or when it is summarized within
the warehouse to act as chronological record
and to provide historical and trend analysis
possibilities

FIG.DATAWAREHOUSE ARCHITECTURE

DATA WAREHOUSE ARCHITECTURE


(BASIC):

The metadata and raw data of a


traditional online transaction processing
(OLTP) system is present, as is an additional
type of data, summary data. A summary in
FIG.PROCESS OF DATA WAREHOUSING
Oracle is called a materialized view.\
ARCHITECTURE OF DATA WAREHOUSE: DATA WAREHOUSE
Three common architectures in data ARCHITECTURE WITH A
Ware house are STAGING AREA:
♣ Warehouse Architecture (Basic) Data Most data warehouses use a
♣ Data Warehouse Architecture (with a staging area instead. A staging area simplifies
Staging Area) building summaries and general warehouse
management.

FIG>DATA WARE WITH STAGING AREA


& DATA MARTS
We may want to customize your warehouse's
architecture for different groups within our
organization. We can do this by adding data
marts, which are systems designed for a
particular line of business.

PROCESSES WITHIN A DATA


WAREHOUSE:

♣ Extract and load the data

FIG.DATAWAREHOUSE WITH STAGING ♣ Clean and transform data into a form

DATA WAREHOUSE ARCHITECTURE that can cope with large data volumes

WITH STAGING AREA & DATA MARTS: and provide good query performance

♣ Backup and archive data


♣ Manage queries, and direct them to the
appropriate data sources
SCHEMAS IN DATA WAREHOUSE:
A schema is a collection of analysis with the use of computers has made it
database objects, including tables, views, an essential tool. However, despite the more
indexes, and synonyms. Commonly used formal approach, Data Mining is something that
Schemas are Star schema, Snowflake schema. scientists perform on an ad hoc basis and can
easily adapt to. Many of the methods used for
STAR SCHEMA:
the analysis of the data were originally
The star schema is the simplest schema. The
developed to process scientific data and are
entity-relationship diagram of this schema
used unchanged.
resembles a star. The center of the star consists
of a large fact table and the points of the star are
Data Warehouse usually contains
the dimension tables. A Star schema is
characterized by one or more fact tables and historical data derived from transaction data,
but it can include data from other sources. The
dimension tables
determination of which schema model should
The main advantages of star schemas are :
be used for a Data Warehouse should be based
• Provide a direct and intuitive mapping
upon the requirements and preferences of the
between the business entities being
Data Warehouse project team. Star schemas are
analyzed by end users and the schema
widely supported by a large number of business
design.
intelligence tools where as Snowflake schemas
• Are widely supported by a large number of
normalize dimensions to eliminate redundancy.
business intelligence tools.A star join is a
As a final point, the biggest of
primary key to foreign key join of the
all, the Internet, is becoming more and more
dimension tables to a fact table.
important, and while there is useful
SNOWFLAKE SCHEMA:
information, extracting that from the terabytes
The Snowflake schema is a more
being added daily is an enormous task. The
complex data warehouse model than a star
techniques of Data Mining are applicable here
schema, and is a type of star schema. The
more than any other domain. However, to make
diagram of the schema resembles a snowflake.
use of it takes time, effort and, above all, people
Snowflake schemas normalize
with a knowledge of the field, to differentiate
dimensions to eliminate redundancy.
the true solutions from the infeasible
CONCLUSION:
Bibliography:
Data Mining is a new term and formalism for a Using Information Technology by
process that has been undertaken by scientists William Sawyers Hutchinson
for generations. The massive increase in the
Data Base System Concepts by Silberschatz,
volume of data collected or generated for
Korth and Sudharshan
Data Base Management Systems by
Alexis Leon and Mathews Leon

http://www.technology-and-computers.com/

Das könnte Ihnen auch gefallen