Sie sind auf Seite 1von 7

Understanding Data warehouses and Data Mining

Understanding Data warehouses and Data Mining Student: Hiren L. Patel Professor : Dr. Jones Homework Unit 3 Everest University, North Campus. Date: May 26, 2012

Understanding Data warehouses and Data Mining Understanding Data Warehousing and Data Mining.

Abstract

Over the years, many large organizations have accumulated massive amounts of data about their customers, suppliers, products, and services. Even many new Web-based companies have amassed large databases about people and products as they have grown. The WWW is itself a large distributed data repository with untold potential. With the growing realization that these vast data resources can be tapped for significant commercial gain, interest in data mining, data warehousing has virtually exploded.

Data Warehousing,

Data warehousing is a collection of decision support technologies, aimed at enabling the knowledge workers such as executive, manager, analysts to make better and faster decisions. Data warehousing technologies have been successfully deployed in many industries such as manufacturing for order shipment and customer support, retail for user profiling and inventory management, financial services for claims analysis, risk analysis, credit card analysis, and fraud detection, transportation (for fleet management), telecommunications (for call analysis and fraud detection), utilities (for power usage analysis), and healthcare (for outcomes analysis). This paper presents a roadmap of data warehousing technologies, focusing on the special requirements that data warehouses place on database management systems (DBMSs).

1. Database and relation A data warehouse is a subject-oriented, integrated, time- varying, non-volatile collection of data that is used primarily in organizational decision making. Typically, the data warehouse is

Understanding Data warehouses and Data Mining maintained separately from the organizations operational databases. There are many reasons for doing this. The data warehouse supports on-line analytical processing (OLAP), the functional and performance requirements of which are quite different from those of the on-line transaction processing (OLTP) applications traditionally supported by the operational databases .

OLTP applications typically automate clerical data processing tasks such as order entry and banking transactions that are essential day-to-day operations of an organization. These tasks are structured and repetitive, and consist of short, atomic, isolated transactions. Decision support Back End Tools and Utilities Data warehousing systems use a variety of data extraction and cleaning tools, and load and refresh utilities for populating warehouses. Data extraction from foreign sources is usually implemented via gateways and standard interfaces (such as Information Builders EDA/SQL, ODBC, Oracle Open Connect, Sybase Enterprise Connect, Informix Enterprise Gateway).

The Need for Data Warehousing

The majority of databases are designed to hold the current data needed by an organization to perform its business activities. In a business organization, current data might include information concerning bills due, inventory levels, and product orders, and would most likely be contained in a billing/inventory/order database. In most cases, the minute that data become outdated, they are deleted from the database. For example, once a bill is paid, data about the bill is removed. Fortunately, many organizations have realized the value of being able to analyze historical data in order to discover patterns of behavior and predict future trends. For example, analyzing

Understanding Data warehouses and Data Mining historical data can tell a retailer what items were ordered, in what quantities, and by which customers.

One of the keys to understanding the value of databases is to understand how one database, whether it is current or historical, can be related to another. If you think about it, it makes good business sense to relate customer data to inventory data (because customers place orders that affect inventory), and inventory data to supplier data (because suppliers provide inventory items). We could name many more examples like this. The problem with most databases is they are not designed to be accessed simultaneously in this fashion.

Data Mining (DM)

Data mining, also known as "knowledge discovery," refers to computer-assisted tools and techniques for sifting through and analyzing these vast data stores in order to find trends, patterns, and correlations that can guide decision making and increase understanding. Data mining covers a wide variety of uses, from analyzing customer purchases to discovering galaxies. In essence, data mining is the equivalent of finding gold nuggets in a mountain of data. The monumental task of finding hidden gold depends heavily upon the power of computers. Data Mining employs techniques from statistics, pattern recognition, and machine learning. Many of these methods are also frequently used in vision, speech recognition, image processing, handwriting recognition, and natural language understanding. However, the issues of scalability and automated business intelligence solutions drive much of and differentiate data mining from the other applications of machine learning and statistical modeling. Data mining refers to using a variety of techniques to identify nuggets of information or decision-making knowledge in bodies of data, and extracting these in such a way that they can be put to use in the areas such as

Understanding Data warehouses and Data Mining decision support, prediction, forecasting and estimation. The data is often voluminous, but as it stands of low value as no direct use can be made of it; it is the hidden information in the data that is useful Data mining is concerned with the analysis of data and the use of software techniques for finding patterns and regularities in sets of data. It is the computer which is responsible for finding the patterns by identifying the underlying rules and features in the data. The idea is that it is possible to strike gold in unexpected places as the data mining software extracts patterns not previously discernable or so obvious that no-one has noticed them before.

DATA MINING TOOLS

The best of the best commercial database packages are now available for data mining and warehousing including IBM DB2, INFORMIX-On Line XPS, ORACLE9i, Clementine, Intelligent Miner, 4 Thought and SYBASE System 10.

Applications of Data Mining

Data mining includes a variety of interesting applications. A few examples are listed below:

By recording the activity of shoppers in an online store, such as Amazon.com, over time, retailers can use knowledge of these patterns to improve the placement of items in the layout of a mail-order catalog page or Web page.

Telephone companies mine customer billing data to identify customers who spend considerably more than average on their monthly phone bill. The company can then target these customers to sell additional services.

Marketers can effectively target the wants and needs of specific consumer groups by analyzing data about customer preferences and buying patterns.

Understanding Data warehouses and Data Mining

Hospitals use data mining to identify groups of people whose healthcare costs are likely to increase in the near future so that preventative steps can be taken.

Data Mining Summarized

In summary, the purpose of DM is to analyze and understand past trends and predict future trends. By predicting future trends, business organizations can better position their products and services for financial gain. Nonprofit organizations have also achieved significant benefits from data mining, such as in the area of scientific progress.

The concept of data mining is simple yet powerful. The simplicity of the concept is deceiving, however. Traditional methods of analyzing data, involving query-and-report approaches, cannot handle tasks of such magnitude and complexity.

Conclusion:

Data mining, data warehousing are designed to assist individuals and organizations in managing and extracting meaning from enormous amounts of data. Data mining is used to analyze data sets and predict future trends. Data warehouses and data marts are used to store and analyze historical data in order to make better decisions and predictions about the future. The purpose of many of these activities and approaches is to relate data sets to each other, group related data together, and ensure the ability of users to access the data they need. Data is a resource that, in many cases, can be tapped for greater understanding and insight.

Understanding Data warehouses and Data Mining References Haag & Cummings, ninth edition, Management Information Systems http://www.olapcouncil.org http://pwp.starnetinc.com/larryg/articles.html Han, J. and M. Kamber (2000). Data Mining: Concepts and Techniques, Morgan Kaufmann. http://www.newagepublishers.com/samplechapter/001329.pdf

Das könnte Ihnen auch gefallen