Sie sind auf Seite 1von 15

SUBMITTED BY:

P.HARISH(II C.S.E)
(harish2harish_89@yahoo.co.in)
G.THEJA CHOUDARY(II C.S.E)
(theja_547@yahoo.co.in)

MALINENI LAKSHMAIAH ENGINEERING COLLEGE


S.KONDA-523101
ANDHRA PRADESH
ABSTRACT

One may claim that the exponential growth in the amount of data provides great opportunities for
data mining. In many real world applications, the number of sources over which this information is
fragmented grows at an even faster rate, resulting in barriers to widespread application of data mining. A
data warehouse is designed especially for decision support queries.
Data warehousing is the process of extracting and transforming operational data into
informational data and loading it into a central data store or warehouse.
The idea behind data mining , then is the “ non trivial process of identifying valid, novel ,
potentially useful, and ultimately understandable patterns in India”
Data mining is concerned with the analysis of data and the use of software technique for finding
patterns and regularities in sets of data. Data mining potential can be enhanced if the appropriate data has
been collected and stored in data warehouse
Data warehousing provides the means to change raw data into information for making effective
business decision – the emphasis on information , not data. The data warehouse is the hub for decision
support data.
This paper also explains relation between operational data, data warehouse and data marts.
1.ABSTRACT
2.INTRODUCTION
2.1.NEED FOR DATAWAREHOUSE
3.STRUCTURE OF DATA WAREHOUSE
3.1. PHYSICAL DATA WAREHOUSE
3.2.LOGICAL DATA WAREHOUSE
3.3.DATA MARTS

4.DATA WAREHOUSE ARCHITECTURE


4.1.LOAD MANAGER
4.2.WAREHOUSE MANAGER
4.3.QUERY MANAGER
5.DATA WAREHOUSE AND BACKNED
PROCESS
6.DESIGNING A DATA WAREHOUSE
7.META DATA
7.1.HUMAN MEAT DATA
7.2.COMPUTER BASED META DATA FOR
PEOPLE TO USE
7.3. COMPUTER BASED META DATA FOR
COMPUTER TO USE
8.INTRODUCTION TO DATAMINIG
9.INTERNAL PROCESS OF DATAMINING AND
WAREHOUSING
10.ADVANTAGES
11.APPLICATIONS
12.CRITICAL ISSUES
13.CONCLUSION
DATA WAREHOUSE & DATAMINING
Every day organizations, both large and small, genetic billions of bytes of data related to all
aspects of their business. But locked up variety of systems, most of this data is extremely difficult to
access. Only a very small part of data - captured, processed and stored is available to decision markers.

INTRODUCTION:
What is data warehouse?
A data warehouse in its simplest perception , is in more than a collection of
the key pieces of information used to manage the and direct business for the most
popular outcome.
A large amount the right information is the key to survival in today’s competitive
environment. And this kind of information can be available only if there’s a totally integrated enterprise
data warehouse.
A data warehouse is repository of integrated information, available for queries and analysis. For
such a repository, data and information extracted from heterogeneous resources and consolidated in a
single source. This makes it much easier and efficient to query the data.
There are two fundamentally different types of information systems in enterprises:
operational systems and informational systems
Operational systems run daily enterprises information like ERP(enterprises resource planning).
Information systems analyze the data make decision on how enterprise will be operate, not only
information systems have different focus from operational ones, they often have a different scope
altogether.
There are some specific rules that govern the basic warehouse , namely that such a structure
should be:
Time dependent, that is containing information collected over time, which implies there must always be
connection between the information in the warehouse and time when it was entered. This is one of the
most important aspect of warehouse as its relates to data mining, because information can then be stored
according to period.
Non-volatile, that is data in a data warehouse never updated but used only for queries. Thus such data only
located from other database such as the operational database. End- users we want to update data must use
operational databases, as only latter can be updated, changed and deleted. This means that a data warehouse
will always be filled with historical data.
Subject oriented, that is built all existing applications of the operational data. Not all the information in
operational database is useful for data warehouse, since the data warehouse is designed specially for
decision support while the operational database information containing day-to-day.
Integrated, that is, it reflects the business information of organization. In an operational data environment
you will find many types of information being used in variety of applications and some applications will be
using different name for same entities. However in a data warehouse is essential to integrate this
information and make it consistent; only one name must exist to describe each entity.
A data warehouse is designed especially for decision support queries,
therefore only data that is needed for decision support extracted from operational data
and stored and stored in warehouse.
Need for Data Warehouse:
1. To summarize the large volumes of data.
2. To integrate data’s from different sources.
3. Make decision makers to access past data.
4. Enable people to make informed decision.
Users:
From the definition we can infer that the data warehouse users are as follows
1. This person’s job involves drawing conclusions from, and making decision
Based on large masses of data.
2. This person doesn’t want to get involved with finding and organizing the
Data for this purpose.
3. This person also doesn’t want to access a database highly technical fashion.

STRUCTURE OF DATA WAREHOUSE:


Data warehousing is one of the hottest industry trends for good reason. The structure of a data
warehouse consist as follows.
• Physical data warehouse
• Logical data warehouse
• Data marts
Physical data marts in which all the data for the data warehouse are stored, along with meta data
and processing for scrubbing , organizing , packing and processing detail the data.
Logical marts also contain as physical database but does not contain actual data. Instead it
contains the information necessary to access the data wherever they reside.
Data mart is subset of an enterprise wide data warehouse, which potentially supports an
enterprise element.

DATA MARTS:
Data marts are partitions of the overall data warehouse. It contains overlapping data. The
task of implementing a data warehouse can be a very big effort, taking a significant amount of
time. One feasible option is to start with a set of data marts for each component of departments.
One can have a stand alone data mart or dependent data mart. A set of smaller, manageable,
database is called data marts.
Stand-alone data mart - a data mart with minimal or no impact on the enterprise operational
databases.
Dependent data mart – similar to stand alone data mart, Except that management of data source
by enterprise database is required. These data sources include operational databases and external source of
data.

DATA WAREHOUSE-ARCITECTURE:
The architecture of an information system refers to the way its pieces are laid out , what types of
tasks allocated to each piece of hoe pieces interaction with each other and how they interact with
outside world. The architecture of data warehouse is shown in figure.

DATA INFORMATION DECISION

L Q
O U
OPERATIONAL A E
DATA D R DATA
Y
M
SUMMARY DIPPERS
A M
N DETAILED INFO A
DIPPERS A INFORMATION N
EXTERNAL G INFORMATION A
DATA E G
R META E
DATA R OLAP
TOOLS
WAREHOUSE MANAGER

FIGURE 1:DATA WAREHOUSE ARCHITECTURE


The architecture consist of following components
1. Load Manager
2. Warehouse manager
3. Query manager
Each component has some specific process.
Load Manager
• It is constructed using a combination of off-the- shelf tools, spoke coding,
C programs and shell scripts.
• Extract the data from the source systems.
• Fast load the extracted data into temporary data source.
• Perform simple transformation into a structure similar to the one in the data warehouse.
Warehouse Manager
• It is constructed using a combination of third party systems management software, bespoke
code, C program and shell scripts.
• Support warehouse management process , such as transforming data, backup and archives into
data warehouse.
• Query Manager
• It is constructed using a combination of user access tools, specialist data warehousing monitoring
tools, native database facilities, bespoke coding, C programs and shell scripts.
• Direct queries to appropriate table.
• Schedule the execution of user queries.

DATA WAREHOUSE AND BACKEND PROCESS:


Data Extraction
- which gather data from multiple heterogeneous, external source.
Data Cleaning
- which detects errors in the data and rectifies them when possible.
Data Transmission
- which converts data from legacy or host format to data warehouse format.
Loading
- which sorts, summarizes , consolidates, computes, views, checks integrity and builds indices
and partition.
Refresh
Refresh- which propagate the updates from the data sources to the
Warehouse.

DESIGNING A DATA WAREHOUSE:


Designing a data warehouse requires specialist knowledge of data design because the data
model consisting of data needed by user who want access at high speed, and so the data design for
warehouse can be differently from that of operational databases.
In a data warehouse , an end-user may want to make joins from many tables and this can
be place tremendous demands on the system. For that reason , the data warehouse requires a high speed
machine and a wide variety of optimization process.

META DATA:
In setting up a data warehouse, the end user and the administrator must have access to all the
information in the tables and attributes. They will want to know a number of things , such as where the data
is located, what data exists, what data type or format it is in , hoe this data related to other data in other
databases, where the data is from and to , whom data belongs to. For these reason, another database
containing the so – called Meta data is needed , which describes of structure of contents of the databases.
Meta data can exist in any of three forms:
1. Human meta data
2. computer based meta data for people to use
3. computer based meta data computer to use
Human meta data:
People always have some sort of meta data in their heads or in their files
Computer based meta data for people to use:
Data warehouse developers often store the descriptive Data in its own. This provides a
comprehensive guide to the data resource
Computer based meta data for Computer to use:
If the meta data items people to use is stored in a well, structured computer readable form , they can
be read by a DBMS. This smooth between users and warehouse.
BACK FLUSHING

DATA WAREHOUSE

Cleaning Reforming

OLAP
DATA DSS
DATA BASES EIS
META DATA

Other Data Inputs/New Data DATAMINING

FIGURE 2: STRUCTURAL CONTENTS OF A DATABASE

Acquisition data for the warehouse involves the following steps


The data must be extracted from multiple and heterogeneous sources.
 Data must be formatted for consistency within the warehouse. Names ,meaning and domain of data
from unrelated sources must be reconciled.
 The data must be cleaned to ensure validity . For input data , Cleaning must occur before the data
are located into the warehouse.
 Recognizing erroneous and incomplete data is difficult to automatic and cleaning that requires
automatic error correction can be even together. They will be likely want to upgrade their data with
the cleaned data. The process of returning cleaning data to the source is called back flushing.
 The data must be fitted into data model of the warehouse. Data from the various sources must be
installed in the data model warehouse. Data may have converted from relational, object oriented or
legacy databases to multidimensional model.
 The must be located in the data warehouse. The sheer volume of data in the warehouse makes
loading the data a significant task.
The basis techniques are used to build data warehouse, known the ‘top down’ approaches. In the
‘top down’ approach, we first build a data warehouse from that we select needed information to design a
data mart. In ‘bottom up’ approach first data marts are designed from that we can design a data
warehouse. This can be developed through the application of Extraction, Transformation and
Transportation (ETT) process.

FIGURE 3: TOP-DOWN FLOW FROM DATA WAREHOUSE TO DATAMARTS


FIGURE 4: BOTTOM-UP FLOW FROM DATA MARTS TO DATA WAREHOUSE

The relationship between operational data, a data warehouse and data marts
OPERATIONAL DATA DATA WAREHOUSE
DATAMARTS

EXTRACT FROM
SEVERAL
DATA BASES

FIGURE 5: RELATIONSHIP AMONG OPERATIONAL DATA,DATA


WAREHOUSE AND DATA MARTS
Functionality: The data warehouse access component support enhanced spreadsheet functionality, effect
queries processing, structured queries and hoc queries, data mining and materialized views. In particular
enhanced spreadsheet functionality includes support for state of the art spreadsheet application as well as
for OLAP application program.
These offers three programmed functionality’s such as the following:
 ROLL-UP: Data is summarized with increasing generalization
 DRILL-DOWN : increasing levels of details are revealed
 PIVOT: cross tabulation
 SELECTION: data is available by value or range
 DERIVED ATTRIBUTES: attributes are computed by operations on sorted and derived values.

INTRODUCTION TO DATA MINING:


Data mining or knowledge discovery in data bases is the nontrivial extraction of implicit,
previously unknown and potentially useful information from the data. This encompasses a number of
technical approaches, such as clustering , data summarization, finding dependency networks, classification
analyzing changes , and detecting anomalies. Data mining search for the relationship and global patterns
that exists in large databases byt are hidden among of data ,such as the relationship between patient data
and medical diagnosis. The relationship represents valuable knowledge about the databases, and objects in
the database, it the database is a faithful mirror of the real word registered by the database. If refers to
using a variety of techniques to identify nuggets of information or decision making knowledge in the
database and extracting these in such a way that they can be put to use in areas such as decision support
, prediction ,forecasting and estimation . In particular , finding associations between items in a database of
customer transaction. Market basket analysis technique used to group items together.

INTERNAL PROCESS OF DATAMINING AND WAREHOUSING:

The figure below clearly implies that what data mining discovers is hypotheses about patterns and
relationships. Those patterns and relationships are then subject to interpretation and evaluation before they
can be called knowledge.

FIGURE 6: INTERNAL PROCESS OF DATA MINING AND WAREHOUSING


ADVANTAGES:
- Data warehouse are free from the restrictions of the transactional environment
there is an increased efficiency in query processing.
- Artificial intelligence techniques, which may include genetic algorithm and neural
networks, are used classification and are employed to discover knowledge from the
data warehouse that may be unexpected or difficult to specify queries.

APPLICATONS:
Data warehousing can be a key differentiator in many industries . At present , some of the most
popular Data warehouse application include:
• Sales and marketing analysis across all industries.
• Inventory turn and product tracking in manufacturing.
• Category management ,vendor analysis , and marketing , program effectiveness analysis
in retail
• Profitability analysis or risk assessment in banking.
• Claims analysis or fraud detection in insurance.
Data mining has many and varied fields of applications such as:
a. Retail/Marketing
• Identify buying patterns from customers
• Find associations among customers demographic characteristics.
• Predict response to mailing campaigns.
• Market basket analysis.
b. Banking
• Detect pattern of fraudulent credit card use
• Identify ‘loyal’ customer.
• Determine credit card spending by customer groups
• Find hidden correlation between different financial indicators.
c. Medicine
• Characterize patient behavior to protect office visits
• Identify successful medical therapies for different illness.
d. Transportation
• Determine the distribution schedule among outlets
• Analyze loading patterns
e. Insurance and Health Care
• Claim analysis – i.e. which medical procedure are claimed
Together.
• Predict which customer will buy new polices.
• Identify behavior pattern of risky customers
• Identify fraudulent behavior
HOW DATA WAREHOUSE& DATAMINING IS USEFUL IN GOVERNMENT
A large number of data warehouse can be identified from existing data sources with in the
central government ministers. Let us examine potential areas on which data warehouse may be developed
and also in future.
CECNSUS DATA, AGRICULTURE, RURAL DEVELOPMENT, HEALTH PLANNING,
EDUCATION, COMMERCE AND TRADE.

OTHER SECTORS:
Tourism, Program implementation, Revenue, Economic affairs, Audit and Accounts.

CRITICAL ISSUES:
Data warehousing helps business makes informed decisions. But there are a few critical issues
that must be faced a head on while designing and implementation a data warehouse. These issues are as
follows.
• Capacity planning
• Security backup and recovery
• Service level agreement
• Performance tuning
• Testing
• Implementation obstacle

CONCLUSION:
Data warehousing provides the means to change raw data into information for making effective
business decision – the emphasis on information, not data. The data warehouse is the hub for decision
support data. Comprehensive data warehouse that integrate operational data with customer, supplier, and
market information have resulted in an explosion of information. Completion requires timely and
sophisticated analysis on an integrated view of the data
. Data mining tool can enhance inference process. Speed up design cycle, but con not be substitute for
statistical and domain expertise. Data mining allows for the creation of a self learning organization.
So the future of data warehouse lies in their accessibility from the internet. Successful
implementation of a data warehouse and data mining requires a high performance; scalable combination
of hardware and software which can integrate easily within existing system, so customer can use data
warehouse to improve their decision –making—and their competitive advantage
A good data warehouse provides the RIGHT data…to the RIGHT PEOPLE… at the RIGHT
time… RIGHT now! While data warehousing organizes data for business analysis, internet has emerged as
the standard for information sharing.

REFERENCES:
Data mining technologies – Arun K Pujari
Data warehousing, Data mining and OLAP
Berson & Smith, Mc-Graw Hill.
Data mining techniques, tools and trends – Bhavani Thuraisingam
Data Base Systems – Elmasri, Tata Mc-Graw Hill

Das könnte Ihnen auch gefallen