Beruflich Dokumente
Kultur Dokumente
ABSTRACT:
Fast, accurate and scalable data analysis techniques are needed to extract useful
information from huge pile of data. Data warehouse is a single, integrated source of decision
support information formed by collecting data from multiple sources, internal to the
organization as well as external, and transforming and summarizing this information to enable
improved decision making. Data warehouse is designed for easy access by users to large
amounts of information, and data access is typically supported by specialized analytical tools
and applications. Typical applications include decision support systems and execution
information system.
Data mining is the exploration and analysis of large quantities of data in order to
discover valid, novel, potentially useful, and ultimately understandable patterns in data. It is
An information extraction activity whose goal is to discover hidden facts contained in
databases.
The process of extracting valid, previously unknown, comprehensible and actionable
information from large databases and using it to make crucial business decisions.
The project entitled Website Data Mining is an application of data mining
which is built for the website developers for their effective creation of websites in
internet.
Data mining finds patterns and subtle relationships in data and infers rules that allow
the prediction of future results. It produces output values for an assigned set of input values.
Typical applications include market segmentation, customer profiling, fraud detection,
evaluation of retail promotions, and credit risk analysis.
DATA WAREHOUSING
Everyday increasingly, organizations are analyzing current and historical data to
identify useful patterns and support business strategies.
A large amount of the right information is the key to survival in todays competitive
environment. And this kind of information can be made available only if theres totally
integrated enterprise data warehouse.
What is data warehousing?
A data warehouse is a subject-oriented, integrated, non-volatile & time-variant
collection of data in support of managements decisions.
Need for Data Warehousing:
IT or business staff spending a lot of time developing special reports for decision-makers.
Lots of PC-based or small server systems obtaining extracts of data incapable of presenting a
holistic view of the entire gamut of information.
Same data present on different systems, in different department and users may be unaware of
this fact.
Difficulty in getting meaningful information in a timely manner.
Multiple systems giving different answer to the business questions.
Less analysis by decision makers and policy planners due to non-availability of sophisticated
tools and easily decipherable, timely and comprehensive information
Operational
data source1
Meta-data
Operational
data source 2
Lightly
Manage
Reporting,
query,application
development, and
EIS(executive
information
system) tools
High
Query
summarized data
summarized
Load Manager
data
Operational
Detailed data
data source n
DBMS
OLAP(online
analytical
processing) tools
Operational
data store (ods)
Data mining
Archive/backup
End-useraccess tools
data
Typical architecture
Warehouse Manager
of a data warehouse
Main Components:
Operational data sourcesfor the DW is supplied from mainframe operational data held
in first generation hierarchical and network databases, departmental data held in
proprietary file systems, private data held on workstaions and private serves and external
systems such as the Internet, commercially available DB, or DB assoicated with and
organizations suppliers or customers
Load manageralso called the frontend component, it performance all the operations
associated with the extraction and loading of data into the warehouse. These operations
include simple transformations of the data to prepare the data for entry into the warehouse
Warehouse managerperforms all the operations associated with the management of the
data in the warehouse. The operations performed by this component include analysis of
data to ensure consistency, transformation and merging of source data, creation of indexes
and views, generation of denormalizations and aggregations, and archiving and backing-up
data
End-user access toolscan be categorized into five main groups: data reporting and
query tools, application development tools, executive information system (EIS) tools,
online analytical processing (OLAP) tools, and data mining tools.
b. Cleansing
c. Transformation
After the critical steps, loading the results into target system can be carried out either
by separate products, or by a single, categories:
Code generators
Applications:
Online Transaction Processing:
OLTP systems are the major kinds of enterprise applications:
Examples:
Order entry systems, Inventory control systems, Reservation systems, Point-of-sale
systems, Tracking systems, etc.
indications of performance
Decision Support Systems (DSS) :
They ideally present information in graphical and tabular form, providing the user
with the ability to drill down on selected information. Note the increased detail and
data manipulation options presented.
DATA MINING
What is data mining?
Data Mining refers to the process of analyzing the data from different perspectives and
summarizing it into useful information. Data mining software is one of the numbers of tools
used for analyzing data from many different dimensions or angles, categorize it, and
summarize the relationship identified.
Definition:
Data mining is the process of finding correlation or patterns among fields
in large
Different Types of Data Mining: Business, Scientific and Internet Data Mining
Five major elements of Data Mining:
1. Extract, transform, & load transaction data on to the data warehouse system.
2. Store and manage data in multidimensional database system.
3. Provide access to business analysts and IT Professionals.
4. Analyze the data by application software.
5. Present the data in useful format such as graph or table..
Methods of Data Mining:
1. Classification
2.Regression
CONCLUSION
Data Warehousing provides the means to change the raw data into information for
making effective business decisions-the emphasis on information, not data. The Data
warehouse is the hub for decision support data.
Data mining is a useful tool with multiple algorithms that can be tuned for specific
tasks. It can benefit business, medicine, and science. It needs more efficient algorithms to
speed up data mining process.