Sie sind auf Seite 1von 56

Book : Data Warehousing and data mining by Han and Kamber

Data Mining

What is data mining?

Data mining (knowledge discovery in databases) or KDD is the Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases

What is not data mining?


Query processing. Expert systems or statistical programs

May 4, 2012

Data Warehousing and Data Mining

Data Mining and DBMS

DBMS - queries based on the data held of an Insurance company e.g.


Last months sales for each product Sales grouped by customer age etc. List of customers who lapsed their policy

Data Mining - infer knowledge from the data held to answer queries e.g.
What characteristics do customers share who lapsed their policies and how do they differ from those who renewed their policies? Why is the Child Care Policy so profitable?

May 4, 2012

Data Warehousing and Data Mining

Data mining

Data mining: a misnomer?

Mining of gold from rock or sand is called gold mining rather than rock and sand mining Knowledge discovery(mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

May 4, 2012

Data Warehousing and Data Mining

Applications of Data Mining


Market Analysis and Management

Where are the data sources for analysis?

Credit card transactions, loyalty cards, discount customer complaint calls, plus (public) lifestyle studies

coupons,

Target marketing

Find clusters of model customers who share the same characteristics: interest, income level, spending habits, etc.

Determine customer purchasing patterns over time

Cross-market analysis

Associations/co-relations between product sales Prediction based on the association information

May 4, 2012

Data Warehousing and Data Mining

Applications of Data Mining

Customer profiling

data mining can tell you what types of customers buy what products (clustering or classification)

Identifying customer requirements


identifying the best products for different customers use prediction to find what factors will attract new customers

Provides summary information


various multidimensional summary reports statistical summary information (data central tendency and variation)

May 4, 2012

Data Warehousing and Data Mining

Applications of Data Mining


Corporate Analysis and Risk Management

Finance planning and asset evaluation


cash flow analysis and prediction contingent claim analysis to evaluate assets cross-sectional and time series analysis (financial-ratio, trend analysis, etc.)

Resource planning

summarize and compare the resources and spending

Competition

monitor competitors and market directions group customers into classes and a class-based pricing procedure set pricing strategy in a highly competitive market
Data Warehousing and Data Mining

May 4, 2012

Applications of Data Mining


Fraud Detection and Management
Applications
widely used in health care, retail, credit telecommunications (phone card fraud), etc. card services,

Approach
use historical data to build models of fraudulent behavior and use data mining to help identify similar instances

Examples

auto insurance: detect a group of people who stage accidents to collect on insurance money laundering: detect suspicious money transactions (US Treasury's Financial Crimes Enforcement Network) medical insurance: detect professional patients and ring of doctors and ring
Data Warehousing and Data Mining

May 4, 2012

Applications of Data Mining


Detecting inappropriate medical treatment
Australian Health Insurance Commission identifies that in many cases blanket screening tests were requested

Detecting telephone fraud

Telephone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm. British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud.

Retail

Analysts estimate that 38% of retail shrink is due to dishonest employees.

May 4, 2012

Data Warehousing and Data Mining

Applications of Data Mining

Other Applications Sports

IBM Advanced Scout analyzed NBA game statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat

Astronomy

JPL and the Palomar Observatory discovered 22 quasars with the help of data mining

Internet Web Surf-Aid

IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover customer preference and analyze effectiveness of Web marketing, improving Web site organization, etc.

May 4, 2012

Data Warehousing and Data Mining

Data Mining: A KDD Process


Pattern Evaluation

Data mining: the core of knowledge discovery Data Mining process.


Data Reduction and Transformation Data Warehouse Selection

Data Cleaning Data Integration Databases


May 4, 2012 Data Warehousing and Data Mining

10

Steps of a KDD Process

Learning the application domain:

Relevant prior knowledge and goals of application

Creating a target database


Data cleaning : Data cleaning is done to remove noise (e.g. age = 3w) and inconsistent data (e.g. age = 175) Data integration : This is done to combines multiple data sources. Data selection : Where data relevant to the analysis task are retrieved from the database

May 4, 2012

Data Warehousing and Data Mining

11

Steps of a KDD Process

Data reduction : Data reduction process finds useful features, provides


dimensionality/variable reduction.

Data transformation : This process transforms the data into forms

appropriate for mining by performing summary or aggregation operations for instance.

Data Mining

Choosing functions of data mining

Choosing the data mining algorithm(s) Data mining: search for patterns of interest

summarization, classification, regression, association, clustering.

Pattern evaluation and knowledge presentation

Use of discovered knowledge

visualization, transformation, removing redundant patterns, and techniques used to present the mined knowledge to the user
Data Warehousing and Data Mining

May 4, 2012

12

Architecture of a Typical Data Mining System


User Interface

Pattern evaluation Data mining engine


Database or data warehouse server
Data cleaning integration and selection Knowledge Base

Database

World Data Warehouse Wide Web

Other info Repositories

May 4, 2012

Data Warehousing and Data Mining

13

Architecture of a Typical Data Mining System


Although data mining is a step in the knowledge discovery process, in industry, in media and in many other applications a data mining system is designed to operate either or both on typical databases and data warehouses. Thus the architecture of a typical data mining system is as shown above.

Databases, Data Warehouse and other information repository are used to hold the data necessary for data mining. Database or Data Warehouse server is used to fetch the relevant data based on the users data mining request. Knowledge base acts as the domain knowledge that is used to guide the search or to evaluate the interestingness of resulting patterns. Data mining engine ideally consists of a set of functional modules for tasks such as summarization, classification, cluster analysis, regression, association, correlation analysis, outlier analysis and evolution analysis.

May 4, 2012

Data Warehousing and Data Mining

14

Architecture of a Typical Data Mining System

Pattern evaluation module performs interestingness measurement and filters discovered patterns by using interestingness thresholds. It also interacts with the data mining module so as to focus the search toward interesting patterns. User interface module establishes communication between users and the data mining system. This module allows users to browse database/data warehouse, evaluate mined patterns, and visualize the patterns in different forms.

May 4, 2012

Data Warehousing and Data Mining

15

Data Mining: On What Kind of Data?


Relational databases Data warehouses Transactional databases Advanced DB and information repositories

Object-oriented and object-relational databases Spatial databases Time-series data and temporal data Text databases and multimedia databases Heterogeneous and legacy databases WWW
Data Warehousing and Data Mining

May 4, 2012

16

Relational Database and Data Warehouse

Relational database also called DBMS consists of a collection of interrelated data known as a database and a set of software programs to manage and access the data. Here data is stored in the form of tables each of which is assigned a unique name. Each tables consists of a set of attributes (columns or fields) and usually stores a large set of tuples (records or rows). Each tuple represent an object identified by an unique key and described by a set of attributes values. Data Warehouse is a repository of information collected from multiple sources (possibly located at different geographical locations) stored under a unified schema and that usually reside at a single site.

May 4, 2012

Data Warehousing and Data Mining

17

Typical framework of a data warehouse

Data warehouses are constructed via a process of data cleaning, integration, selection, reduction, transformation followed by data loading and periodic data refreshing. Here the data are organized around major subjects such as customer, item, supplier, and activity. Here the data are stored to provide historical perspective (say for past 5-10 years) and are typically summarized.

May 4, 2012

Data Warehousing and Data Mining

18

Data warehouse and data mart


A data warehouse is usually modeled by a multidimensional cube where each dimension corresponds to an attribute or set of set of attributes and each cell stores the value of some aggregate measure such as count or sale_amount. A data cube provides multidimensional view of data and allows the precomputation and fast accessing of summarized data. A data warehouse collects information about subject that span an entire organization. A data mart on the other hand is a department subset of a data warehouse. It focuses on selected subjects.

May 4, 2012

Data Warehousing and Data Mining

19

Multidimensional Cube used for data warehouse

May 4, 2012

Data Warehousing and Data Mining

20

Database vs. Data Warehouse


The primary difference between a database and a data warehouse is that while the former is designed (and optimized) to record on line transaction (called on-line transaction processing or OLTP) very fast, the latter has to be designed (and optimized) to respond to analysis questions that are critical for a specific area of application.

A Data Warehouse (DW) is designed for facilitating querying and analysis. Often designed as OLAP (On-Line Analytical Processing) systems, these databases contain read-only data that can be queried and analyzed far more efficiently as compared to regular OLTP application databases. In this sense an OLAP system is designed to be read-optimized.

May 4, 2012

Data Warehousing and Data Mining

21

Data Warehouse vs. Database


Data warehouse
Designed for analysis of business measures by categories and attributes

Database
Designed for real time business operations.

Optimized for bulk loads and large, Optimized for a common set of complex, unpredictable queries that access transactions, usually adding or retrieving a many rows per table. single row at a time per table.

Loaded with consistent, valid data; requires no real time validation

Optimized for validation of incoming data during transactions; uses validation data tables.

Supports few concurrent users relative to OLTP

Supports thousands of concurrent users.

May 4, 2012

Data Warehousing and Data Mining

22

Other databases
Transactional databases : It consists of a file where each record represents a transaction specified by a transaction identity number and a list of items making up the transaction.

Note that most relational database systems do not support nested relational structures such as list of items as in the said example. This type of database facilitates market basket analysis that would enable one bundle groups of items together as a strategy for maximizing sales.

May 4, 2012

Data Warehousing and Data Mining

23

Object-Relational Databases
The object-relational data model inherits essential concepts of objectoriented database, where each entity considered as an object. For example, objects can be individual employees, customers, or items. Data and code relating to an object are encapsulated into a single unit. Each object has associated with it the following. A set of variables that describe the objects. These correspond to attributes in the entity-relationship and relational models. A set of messages that the object can use to communicate with other objects, or with the rest of the database system. A set of methods, where each method holds the code to implement a message. Upon receiving a message, the method returns a value in response. Objects that share a common set of properties can be grouped into an object class. Each object is an instance of its class.
May 4, 2012 Data Warehousing and Data Mining

24

Temporal Databases Sequence Databases, and Time-Series Databases


A temporal database typically stores relational data that include time-related attributes. These attributes may involve several timestamps, each having different semantics. A sequence database stores sequences of ordered events, with or without concrete notion of time. Examples include customer shopping sequences, web click streams, and biological sequences. A time-series database stores sequences of values or events obtained over repeated measurements of time (e.g., hourly, daily, weekly). Examples include data collected from the stock exchange, inventory control, and the observation of natural phenomena (like temperature and wind). Data mining techniques can be used to find the characteristics of object evolution or the trend of changes for objects in the database. Such information can be useful in decision making and strategy planning.

May 4, 2012

Data Warehousing and Data Mining

25

Spatial Databases and Spatiotemporal Databases


Spatial databases contain spatial related information. Examples include geographic (map) databases, very large-scale integration (VLSI) or computed-aided design databases, and medical and satellite image databases. Spatial data may be represented in raster format, consisting of n-dimensional bit maps or pixel maps. For example, a 2-D satellite image may be represented as raster data, where each pixel registers the rainfall in a given area. Geographic databases have numerous applications, ranging from forestry and ecology planning to providing public service information such as for moving from region A to region B during rush hour, and the location of restaurants and hospitals.

May 4, 2012

Data Warehousing and Data Mining

26

Spatial Databases and Spatiotemporal Databases


What kind of data mining can be performed on spatial databases?

Data mining may uncover patterns describing the characteristics of houses located near a specified kind of location, such as a park, for instance. Other patterns may describe the climate of mountainous areas located at various altitudes, or describe the change in trend, of metropolitan poverty rates based on city distances from major highways.

May 4, 2012

Data Warehousing and Data Mining

27

Text Databases and Multimedia Databases


Text databases are databases that contain word descriptions for objects.

What kind of data mining can be performed on text databases ?


By mining text data, one may uncover general and concise descriptions of the text documents, keyword and content associations, as well as the clustering behavior of text objects. Multimedia databases stores image, audio, and video data. They are used to applications such as picture content-based retrieval, voice-mail systems, the world wide web, and speech-based user interfaces that recognize spoken commands.

May 4, 2012

Data Warehousing and Data Mining

28

Heterogeneous Databases and Legacy Databases


A heterogeneous database consists of a set of interconnected, autonomous component databases. The components communicate in order to exchange information and answer queries. A legacy database is a group of heterogeneous databases that combines different kind of data systems, such as relational or objectoriented databases, hierarchical databases, network databases, spreadsheets, multimedia databases, or file systems. The heterogeneous databases in a legacy database may be connected by intra or inter-computer networks. Information exchange across such databases is difficult because it would require precise transformation rules from one representation to another, considering diverse semantics.

May 4, 2012

Data Warehousing and Data Mining

29

Data streams
Many applications involve the generation and analysis of a new kind of data, called stream data, where data flow in and out of an observation platform (or window) dynamically. Such data streams have the following unique features:

huge or possibly infinite volume, dynamically changing, flowing in and out in a fixed order, allowing only one or a small number of scans, and demanding fast (often real-time) response time.

Typical example of data streams include various kinds of scientific and engineering data, time-series data, and data produced in other dynamic environments, such as power supply, network traffic, stock exchange, telecommunications, web click streams, video surveillance(inspection), and weather or environment monitoring. Because data streams are normally not stored in any kind of data repository, effective and efficient management and analysis of stream data poses great challenges to researchers.
May 4, 2012 Data Warehousing and Data Mining

30

The World Wide Web


The World Wide Web and its associated distributed information services, such as yahoo! Google, America Online, MSN etc. provide rich, worldwide, on-line information services, where data objects are linked together to facilitate interactive access. Such systems provide ample opportunities and challenges for data mining. For Example, understanding user access patterns will not only help improve system design (by providing efficient access between highly correlated objects), but also leads to better marketing decisions (e.g., by placing advertisements in frequently visited documents, or by providing better customer/user classification and behavior analysis). Capturing user access patterns in such distributed information environments is called Web usage mining (or Weblog mining).

May 4, 2012

Data Warehousing and Data Mining

31

Data Mining: Confluence of Multiple Disciplines


Database Technology

Statistics

Machine Learning
Pattern Recognition

Data Mining

Visualization

Algorithm

Other Disciplines

May 4, 2012

Data Warehousing and Data Mining

32

Data Mining: Classification Schemes

General functionality

Descriptive

data

mining

Descriptive

mining

tasks

characterize the general properties of the data in the database. It utilizes human-interpretable patterns that describe the data.

Predictive data mining : Predictive mining task perform inference on the current data in order to make predictions. It uses some variables to predict unknown or future values of some other variables.

May 4, 2012

Data Warehousing and Data Mining

33

Data Mining: Classification Schemes

Different views lead to different classifications


Data view : Kinds of data to be mined

Knowledge view : Kinds of knowledge to be discovered


Method view : Kinds of techniques utilized Application view : Kinds of applications adapted

May 4, 2012

Data Warehousing and Data Mining

34

Data Mining: Classification Schemes

Data to be mined

Relational, data warehouse, transactional, stream, objectoriented/relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW

Knowledge to be mined Descriptive : Summarization or Characterization, Discrimination, Association, Clustering, Sequential Pattern Discovery etc. Predictive : Classification and Prediction, Regression, Time series analysis Techniques utilized

Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, etc. Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc.
Data Warehousing and Data Mining

Applications adapted

May 4, 2012

35

Data Mining: Knowledge to be mined

Summarization or characterization : This provides the general characteristics or features of a target class of data. For instance one may want to find the characteristics (such as 40-50 years old, employed etc. ) of customers who spend more than Rs.10000/ a year in a shopping mall. This system should allow the users to drill down on any dimensions such as on occupation to view those customers according to their type of employment. Output of charecterization process are often represented in the form of pie charts, bar charts, curves, multidimensional data cubes and multidimensional tables.

May 4, 2012

Data Warehousing and Data Mining

36

Data Mining: Knowledge to be mined


Data Discrimination : It provides comparison of the general features of target class data objects with the general features of objects from one or a set of contrasting classes. For instance, a data mining system should be able to compare the features of two groups of customers such as those who shop for computer products more than two times a month versus those who shop for less than three times a year.

The out of such a process may provide a general comparative profile of the customers, such as 80% of the customers who frequently purchase computer products are between 20 and 40 years old and have a university education whereas 60% of the customers who infrequently buy such products are either senior or youth and have no university degree. Drilling down on a dimension such as occupation or adding new dimension such as income level may help in finding even more discriminative feature between the two classes.
May 4, 2012 Data Warehousing and Data Mining

37

Data Mining: Association (Descriptive)

Given a set of records each of which contain some number of items from a given collection; Produce dependency rules which will predict occurrence of an item based on occurrences of other items.
Items

TID

1 2 3 4 5

Bread, Coke, Milk Beer, Bread Beer, Coke, Diaper, Milk Beer, Bread, Diaper, Milk Coke, Diaper, Milk

Rules Discovered:
{Milk} --> {Coke} {Diaper, Milk} --> {Beer}

May 4, 2012

Data Warehousing and Data Mining

38

Data Mining: Association (Descriptive)


Association Rule Discovery: Application

Inventory Management: Goal: A consumer appliance repair company wants to anticipate the nature of repairs on its consumer products and keep the service vehicles equipped with right parts to reduce on number of visits to consumer households. Approach: Process the data on tools and parts required in previous repairs at different consumer locations and discover the co-occurrence patterns.

May 4, 2012

Data Warehousing and Data Mining

39

Data Mining: Association (Descriptive)


Confidence and Support
A confidence of 50% means that if a customer buys a product A there is 50% chance that he/she will buy product B as well. A 1% support means that 1% of all the transaction under analysis that product A and B were purchased together. Association rules are discarded as uninteresting if they do not satisfy both a minimum confidence threshold and a minimum support threshold.

May 4, 2012

Data Warehousing and Data Mining

40

Data Mining: Clustering (Descriptive)

Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that

Data points in one cluster are more similar to one another. Data points in separate clusters are less similar to one another. Euclidean Distance if attributes are continuous. Other Problem-specific Measures.

Similarity Measures:

May 4, 2012

Data Warehousing and Data Mining

41

Data Mining: Clustering (Descriptive)


x Euclidean Distance Based Clustering in 3-D space.

Intracluster distances are minimized

Intercluster distances are maximized

May 4, 2012

Data Warehousing and Data Mining

42

Data Mining: Clustering (Application)

Market Segmentation: Goal: subdivide a market into distinct subsets of customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix. Approach:

Collect different attributes of customers based on their geographical and lifestyle related information. Find clusters of similar customers. Measure the clustering quality by observing buying patterns of customers in same cluster vs. those from different clusters.

May 4, 2012

Data Warehousing and Data Mining

43

Data Mining: Clustering (Application)

Document Clustering:

Goal: To find groups of documents that are similar to each other based on the important terms appearing in them. Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster. Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents.

May 4, 2012

Data Warehousing and Data Mining

44

Sequential Pattern Discovery (Descriptive)

Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.

(A B)

(C)

(D E)

Rules are formed by first discovering patterns. Event occurrences in the patterns are governed by timing constraints.

(A B)

(C)

(D E)

May 4, 2012

Data Warehousing and Data Mining

45

Sequential Pattern Discovery: Examples

In telecommunications alarm logs, (Inverter_Problem Excessive_Line_Current) (Rectifier_Alarm) --> (Fire_Alarm)


In point-of-sale transaction sequences, Computer Bookstore: (Intro_To_Visual_C) (C++_Primer) --> (Perl_for_dummies,Tcl_Tk) Athletic Apparel Store: (Shoes) (Racket, Racketball) --> (Sports_Jacket)

May 4, 2012

Data Warehousing and Data Mining

46

Classification (Predictive) : Definition

Given a collection of records (training set )

Each record contains a set of attributes, one of the attributes is the class.

Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible.

A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

May 4, 2012

Data Warehousing and Data Mining

47

Classification (Predictive) : Implementation

May 4, 2012

Data Warehousing and Data Mining

48

Classification (Predictive) : Example

Tid Refund Marital Status 1 2 3 4 5 6 7 8 9 10


10

Taxable Income Cheat 125K 100K 70K 120K No No No No Yes No


10

Refund Marital Status No Yes No Yes No No Single Married Married

Taxable Income Cheat 75K 50K 150K ? ? ? ? ? ?

Yes No No Yes No No Yes No No No

Single Married Single Married

Divorced 90K Single Married 40K 80K

Divorced 95K Married 60K

Divorced 220K Single Married Single 85K 75K 90K

No Yes No Yes

Test Set
Model

Training Set

Learn Classifier

May 4, 2012

Data Warehousing and Data Mining

49

Classification (Predictive) : Example

Direct Marketing Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new cell-phone product. Approach:

Use the data for a similar product introduced before. We know which customers decided to buy and which decided otherwise. This {buy, dont buy} decision forms the class attribute. Collect various demographic, lifestyle, and companyinteraction related information about all such customers.

Use this information as input attributes to learn a classifier model.

Type of business, where they stay, how much they earn, etc.

May 4, 2012

Data Warehousing and Data Mining

50

Classification (Predictive) : Example

Fraud Detection Goal: Predict fraudulent cases in credit card transactions. Approach:

Use credit card transactions and the information on its account-holder as attributes.

When does a customer buy, what does he buy, how often he pays on time, etc

Label past transactions as fraud or fair transactions. This forms the class attribute. Learn a model for the class of the transactions. Use this model to detect fraud by observing credit card transactions on an account.

May 4, 2012

Data Warehousing and Data Mining

51

Classification (Predictive) : Example

Sky Survey Cataloging Goal: To predict class (star or galaxy) of sky objects, especially visually faint ones, based on the telescopic survey images (from Palomar Observatory).

Approach:

3000 images with 23,040 x 23,040 pixels per image.

Segment the image. Measure image attributes (features) - 40 of them per object. Model the class based on these features. Success Story: Could find 16 new high red-shift quasars, some of the farthest objects that are difficult to find!

May 4, 2012

Data Warehousing and Data Mining

52

Classification (Predictive) : Example


Courtesy: http://aps.umn.edu

Early

Class:

Attributes:

Stages of Formation

Image features, Characteristics of light waves received, etc.

Intermediate

Late

Data Size:

72 million stars, 20 million galaxies Object Catalog: 9 GB Image Database: 150 GB


May 4, 2012 Data Warehousing and Data Mining

53

Regression (Predictive)

Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. Greatly studied in statistics, neural network fields. Examples: Predicting sales amounts of new product based on advertising expenditure. Predicting wind velocities as a function of temperature, humidity, air pressure, etc. Time series prediction of stock market indices.

May 4, 2012

Data Warehousing and Data Mining

54

Are all the Discovered Patterns Interesting?

A data mining system/query may generate thousands of patterns, not all of them are interesting.

Suggested approach: Human-knowledge centered, query-based focused mining

Interestingness measures: A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of

certainty, potentially useful, novel, or validates some hypothesis that a


user seeks to confirm

Objective vs. subjective interestingness measures:

Objective: based on statistics and structures of patterns, e.g.,

support, confidence, etc.

Subjective: based on users belief in the data, e.g., unexpectedness (contradicting a users belief), novelty, action-ability (strategic information on which the users can act), etc.

May 4, 2012

Data Warehousing and Data Mining

55

Can We Find All and Only Interesting Patterns?

Find all the interesting patterns: Completeness


Can a data mining system find all the interesting patterns? It is useless and inefficient to generate all of the possible patterns. Instead, user-provided constraints and interestingness measures should be used to focus the search. Can a data mining system find only the interesting patterns? Approaches

Search for only interesting patterns: Optimization


First generate all the patterns and then filter out the uninteresting ones. Generate only the interesting patternsmining query optimization

May 4, 2012

Data Warehousing and Data Mining

56

Das könnte Ihnen auch gefallen