Sie sind auf Seite 1von 6

Usama M.

Fayyad, Microsoft Research

E RAPIDLY EMERGING FIELD OF CURRENT COMPUTING AND STORAGE TECHNOLOGY IS


knowledge discovery in databases (KDD)
RAPIDLY OUTSTRIPPING SOCIETY’S ABILITY TO MAKE
has grown significantlyin the past few years.
This growth is dnven by a mix of daunting MEANINGFUL USE OF THE TORRENT OF AVAILABLE DATA.
practical needs and strong research interest.
The technology for computing and storage
WITHOUTA CONCERTED EFFORT TO DEVELOP KNOWLEDGE
has enabled people to collect and store infor- DISCOVERY TECHNIQUES, ORGANZZATIONS STAND TO
mation from a wide range of sources at rates
that were, only a few years ago, considered FORFEIT MUCH OF THE VALUE FROM THE DATA THEY
unimaginable. Although modern database CURRENTLY COLLECT AND STORE.
technology enables economical storage of
these large streams of data, we do not yet
have the technology to help us analyze,
understand, or even visualize t h ~ stored
s data. via simple queries, simple string matching, Getting a handle on the
Examples of this phenomenon abound in or mechanisms for displaying the data. problem
a wide spectrum of fields: finance, banking, Prolific sources of data are not res@icted to
retail sales, manufacturing, monitoring and esoteric endeavors involving spacecraft or Why are today’s database and automated
diagnosis (be it of humans or machines), sophisticatedscientific instnrments. Imagine a match and retrieval technologies not ade-
health care, marketing, and science data database receiving transactions from common quate for addressing the analysis needs? The
acquisition, among others. In science, mod- daily activities such as supermarketor depart- answer lies in the fact that the patterns to be
em instruments can easily measure and col- ment store checkout-register sales, or credit searched for, and the models to be extracted,
lect terabytes (1012bytes)of data. For e x m - card charges.Or think of the informationreach- are typically subtle and require significant
ple, NASA’s Earth Observing System is ing your home television set as a stream of sig- specific domain knowledge. For example,
expected to return data at rates of several nals that, to be properly managed, need to be consider a credit card company wishing to
gigabytesper hour by the end of the century.’ cataloged and indexed, and perhaps searched analyze its recent transactionsto detect frand-
Quite appropriately, the problem of how to for interesting content at a higher level-chan- ulent use or to use the individual history of
put the torrent of data to use in analysis is nels, programs, genre, or mood, for example. customers to decide on-line whether an
often called “drinking from the fire hose.” The explosion in the number of resources avail- incoming new charge is likely to be from an
What we mean by analysis is not well- able on the global computer network-the unauthorized user. This is clearly not an easy
defined because it is highly context- and World Wide Web-is another challenge for classification problem to solve.
goal-dependent. However, as I argue, it typ- indexing and searchingthrough a continually One can imagine constructing a set of
ically transcends by far anything achievable changing and growing “database.” selection filters that trigger a set of queries to

20 0885-9000/96/$4.000 1996 IEEE IEEE EXPERT


check if a particular customer has made sim- Finally, there are situations where one KDD and data mining:
ilar purchases in the past, or if the amount or would like to search for patterns that humans background
the purchase location is unusual, for exam- are not well-suited to find. Typically, this
ple. However, such a mechanism must involves statistical modeling, followed by I use the term KDD to denote the overall
account for changing tastes, shifting trends, “outlier” detection, pattern recognition over process of extracting high-level knowledge
and perhaps travel or change of residence. large data sets, classification, or clustering. from low-level data. Others in this special
Such a problem is inherentlyprobabilistic and (Outliers are data points that do not fit within issue of IEEE Expert and elsewhere might
would require a reasoning-with-uncertainty a hypothesis’s probabilistic mode and hence use the terms data mining and KDD inter-
scheme to properly handle the trade-offs are likely the result of interference from changeably. The multitude of names used
between disallowing a charge and risking a anotherprocess.) Most databasemanagement for KDD include data or information har-
false alarm, which might result in the loss of systems (DBMSs) do not allow the type of vesting, data archeology, functional depen-
a sale (or even a customer). access and data manipulation that these tasks dency analysis, knowledge extraction, and
In the past, we could rely on human ana- require; there are also serious computational data pattern analysis. Historically, in statis-
lysts to perform the necessary analysis. and theoretical problems attached to per- tics especially, the term data mining orfish-
Essentially, this meant transforming the forming data modeling in high-dimensional ing refers (often derogatorily) to sloppy
problem into one of simply retrieving data, spaces and with large amounts of data. exploratory data analysis with no a priori
displaying it to an analyst, and relying on These challenges are central to KDD and hypotheses to verify.
expert knowledge to reach a decision. How- need urgent attention. Without heavily
ever, with large databases, a simple query can emphasizing KDD development and re- A simple definition. A simple high-level
easily return hundreds or thousands (or even search, we run the risk of forfeiting the definition of KDD is as follows:
more) matches. Presenting the data, letting value of most of the data that we collect and Knowledge discoveryin databases in the non-
the analyst digest it, and enabling a quick store. We would eventually drown in an trivial process of identifying valid, novel, poten-
(and correct) decision becomes infeasible. ocean of massive (but valuable) data sets tially useful, and ultimately understandablepat-
Data visualization techniques can signifi- that are rendered useless because we can- terns in data.2
cantly assist this process, but ultimately the not distill the essence from the bulk. To Given the scope of this short article, I will
reliance on the human in the loop becomes a draw on the data-mining analogy: the pre- not go into the definitionsof each term in this
major bottleneck. (Visualization works only cious nuggets of knowledge need to be high-level statement. Note, however, that the
for small sets and a small number of vari- extracted and the massive raw material term knowledge is (and has a long history of
ables. Hence, the problem becomes one of needs to be managed appropriately (and being) difficult to define in the abstract. We
finding the appropriate transformations and preferably recycled effectively). adopt the view that knowledge is in the eye
reductions-typically just as difficult as the Before proceeding further, let’s define of the beholder, so one person’s knowledge
original problem.) what we mean by KDD and data mining. could easily be another person’s junk.2 We

OCTOBER 1996 21
Figure 1. An overview of the KDD process.* (For simplicity, the illustration omits arrows indicating the multitude of potential loops and iterations.)

define knowledge in domain-dependent result in changes in preceding or succeeding mg, humans find it easier to define features
terms relating strongly to measures of util- steps, often requiring starting from scratch than to solve the data-mining problem. For
ity, validity, novelty, and understandability. with new choices and settings. Hence, in the instance, an expert can observe a set of low-
The term patterns in this definition loosely definition I adopt, data mining is just a step level variables and reach an intuitive deci-
denotes either models or pattems. In general, in the overall KDD process. sion. For example, the low-level data may
it designates some abstract representation of Figure 1 outlines the KDD process; it is consist of a stream of readings of voltages,
a subset of the data. A significant term in the perhaps deceptive because it gives the currents, capacitances, loads, and so forth
definition is process, which indicates that impression that the steps are well-defined. In from a power plant; a set of pixels in multi-
knowledge drscovery often involves experi- fact, the interactions between the choices of spectral images from a remote sensing
mentation, iteration, user interaction, and techniques used in the various steps, the instrument; or a set of transactions for a
many design decisions and customizations. parameters used for those techniques, and given group of bank accounts. However, the
Extracting knowledge from data can easily the choice of problem representation are expert might not at all be capable of eluci-
turn into a complicated and sometimes ardu- extremely complex. Small changes in one dating the reasons for reaching some deci-
ous process. But the payoffs for success can part can dramatically affect the rest, and con- sion about the state of the system being
be dramatic and rewarding, sometimes sequently can make the difference between observed. This is typically a reasonable set-
enabling people and organizationsto achieve success and failure of a KDD enterprise. ting to use classification (supervised learn-
tasks that would not otherwise be p ~ s s i b l e . ~ ing) techniques to derive classifiers from
We adopt the convention that data mining The KDD process. Table 1 expands on the examples (the data) directly. Hence, the
refers to the act of extractingpattems or mod- steps outlined in Figure 1. A few items in expert presents the system with training data
els from data (be it automated or human- Table 1 warrant further comment. Step 4 is consisting of classified examples.
assisted). However, many steps precede the critical and can be quite involved. Indeed, in For a nonexpert (especially a machine),
data-mining step: retrieving the data from a many cases, some sophisticated searching using the raw observed data to classify events
large warehouse (or some other source); and cataloging problem must be solved is likely to result in failure: knowledge of
selecting the appropriatesubset to work with; before the actual subsequent analysis is per- time sequence, of properties of instrnments,
deciding on the appropriate sampling strat- formed. This transformation could require of noise, of what is an important quantity,
egy; cleaning the data and dealing with miss- solving a significantproblem in its own right. and so forth is simply a prerequisite Experts
ing fields; and applyingthe appropriatetrans- In classical pattern-recognition work, this is can be asked to define features denved from
formations, dimensionality reduction, and called thefeature extraction problem. In gen- the lower-level data. In effect, feature defin-
projections. The data-mining step then fits eral, its solution requires a good bit of ition by experts lets them decompose the
models to, or extracts patterns from, the pre- domain knowledge and strong intuition about problem into small parts and encode signif-
processed data. However, to decide whether the problem. It typically makes the difference icant prior knowledge implicitly in their
this extracted information does represent between success and failure of the data min- choice of representation. This can easily
knowledge, one needs to evaluate this infor- ing (Step 7). result in a large number of features. Typi-
mation, perhaps visualize it, and finally con- However, not to discourage the reader, cally, the expert would not know how to use
solidate it with existing (and possibly con- feature definition and extraction in many these features to solve the classificahon (dis-
tradictory) knowledge. Obviously, these applications is not terribly difficult (espe- crimination) or modeling problem. Data-
steps are all on the critical path from data to cially for a motivated domain expert who is mining techniques provide a way to get to the
knowledge. Furthermore, any one step can involved in Ahe process). Generally speak- solution in this feature space.
f
22 IEEE EXPERT
Table 1. Steps involved in the KDD process.

STEP EXPLANATION

1. Developing an understanding of With today’s technology, this step requires a fair bit of reliance on the user/analyst. Factors to consider
the application domain, the rel- include:
evant prior knowledge, and the What are the bottlenecks in the domain? What is worth automating and what is best left for process-
goals of the end user. ing by humans?
What are the goals? What performance criteria are important?
Will the final product of the process be used for classification, visualization, exploration, summariza-
tion, or something else?
Is understandability an issue? What is the trade-off between simplicity and accuracy of the extracted
knowledge? Is a black box model appropriate for the performance element of the system?
2. Creating a target data set, selecting This involves considerations of homogeneity of data, any dynamics and change over time, sampling
a data set, or focusing on a subset strategy (such as uniform random versus stratified), sufficiency of sample, degrees of freedom, and
of variables or data samples, on so forth.
which discovery is to be performed.

3. Data cleaning and preprocessing. Involved here are basic operations such as the removal of noise or “outliers,” if appropriate; collect-
ing the necessary information to model or accounting for noise; deciding on strategies for handling
missing data fields; accounting for time sequence information, known changes, and appropriate nor-
malization; and so forth.

4. Data reduction and transformation. This involves finding useful features to represent the data, depending on the goal of the task; using
dimensionality reduction or transformation methods to reduce the effective number of variables under
consideration or to find invariant representations for the data; and projecting the data onto spaces in
which a solution is likely to be easier to find.

5. Choosing the data-mining task. This involves deciding whether the goal of the KDD process is classification, regression, clustering,
summarization, dependency modeling, or change and deviation detection. (See Advances in Knowledge
DiscoveiyandData Mining for more details and a tutorial exposition on data-mining tasks and methods.)

6. Choosing the data-mining Here we select the methods to be used for searching for patterns in or fitting models to the data. The
algorithm(s). choice of which models and parameters may be appropriate is often critical. In addition, the data-
mining method must be compatible with the goals: the end-user may be more interested in under-
standing the model than its predictive capabilities.

7. Data mining. This involves searching for patterns of interest in a particular form or a set of such representations:
classification rules or trees, regression, clustering, and so on. The user can significantly aid the data-
mining method by correctly performing the preceding steps.

8. Evaluating output of Step 7. Here we decide what is to be deemed knowledge, which can be a fairly difficult task. Achieving accept-
able results may involve using several options (possibly in combination):
Defining an automated scheme using measures of “interestingness” and others to filter knowledge
from other outputs. Such measures might be statistical measures, goodness of fit, or simplicity,
among others.
Relying on visualization techniques to help the analyst decide the utility of extracted knowledge or
reach conclusions about the underlying datdphenomena.
Relying entirely on the user to sift through derived patterns in the hope of coming across items of interest.
The outcome of this step might result in changes to any of the preceding steps and a restart of the entire
process.

9. Consolidating discovered This also includes checking for and resolving potential conflicts with previously believed (or extracted)
knowledge: incorporating this knowledge.
knowledge into the performance
system, or simply documenting
it and reporting it to users.

In Step 8, the reliance on visualization is dimensional spaces, so the choice of what to and practitioners from a wide variety of
simply a work-around of the fact that we find show the user to facilitate the discovery is fields. The major related fields include
it difficult to emulate human intuition and still critical and typically not an easy prob- statistics, machine learning, artificial intel-
decision-making on a machine. The idea is lem to circumvent. ligence and reasoning with uncertainty,
to transform the derived knowledge into a databases, knowledge acquisition, pattern
format that is easy for humans to digest (such recognition, information retrieval, visual-
as images or graphs) and then rely on the ization, intelligent agents for distributed
speed and capability of the highly evolved and multimedia environments, digital
human visual system to spot what is inter- By definition, KDD is an interdiscipli- libraries, and management information
esting. Of course, this only works in low- nary field that brings together researchers systems.

OCTOBER 1996 23
The remainder of this article briefly out- more, the typical approaches require an a pri- norms7). Such a mapping is often not easy to
lines how some of these relate to the various ori model and significant domain knowledge formulate meaningfully: Is the distance
parts of the KDD process. I focus on the main of the data as well as of the underlying math- between the values “square” and “circle” for
fields and hope to clarify to the reader the ematics for proper use and interpretation. In the variable shape greater than the distance
role of each of the fields and how they fit addition, issues having to do with interfaces between “male” and “female” for the vari-
together naturally when unified under the to databases, dealing with massive data sets, able sex?
goals and applications of the overall KDD and techniques for efficientdata management Techniques originatingin AI have focused
process. A detailed or comprehensive cover- have only recently begun to receive attention almost exclusively on dealing with data at the
age of how they relate to the KDD process in statistic^.^ John Elder and Darryl Pregibon symbolic (categorical)level, with little atten-
would be too lengthy and not very useful provide an excellent exposition of statistical tion paid to continuousvariables. In machine
because ultimately one can find relations to perspectives on KDD.5 learning and case-basedreasoning,algorithms
every step from each of the fields. The arti- for classificationand clusteringhave focused
cle aims to give a general review and paint Pattern recognition, machine learning, heavily on heuristic search and nonparametric
with a broad brush. By no means is this and artificial intelligence. In pattern recog- models. Emphasis on mathematicalrigor and
intended to be a guide to the literature, nei- nition, work has historically focused onprac- analysis of results has not been as strong as in
ther do I aim at being comprehensive in any tical techniques with an appropriate mix of statistics or pattem recognition, with the excep-
sense of the word. rigor and formalism. The major applicable tion of computationallearning theory, which
techniques fall under the category of classi- has focused on formal general worst-case
Statistics. Statistics plays an important role bounds for a wide class of representations (a
primarily in data selection and sampling, good starting point here is Computational
data mining, and evaluation of extracted Learning Theory*).Machine learning work
knowledge steps. Historically, most statis- contributesmainly to the data-mining step of
tics work has focused on evaluation of THEMAIN DRIVER FOR THIS the process, with some contributionsin the area
model fit to data and on hypothesis testing. of representation and selection of variables
These are clearly relevant to evaluating the HEALTHY GROWTH HAS BEEN, through significant search. In addition, the
results of data mining to filter the good from AND IS LIKELY TO CONTINUE machine discovery community has focused on
the bad, as well as within the data-mining techniques for discovering structurein data as
step itself in searching for, parametrizing, TO BE, THE MAJOR SOCIAL, well as empirical laws to describe observations
and fitting models to data. On the front end, ECONOMICAL, AND SCIENTIFIC (as in scientificdiscovery of laws; see Discov-
sampling schemes play an important role in ering Causal Strcuture9).
selecting which data to feed to the data- NEED FOR TECHNIQUES THAT AI techniques for reasoning, especially
mining step. For the data-cleaning step, sta- WILL ENABLE US TO MAKE USE techniques from the Uncertainty in AI com-
tistics offers techniques for detecting “out- munity’O and graphical models for Bayesian
liers,’’ smoothing data when necessary, and OF ALL THE DATA W E GATHER. modeling and reasoning,ll provide a power-
estimating noise parameters. To a lesser ful altemative to classical density estimation
degree, estimation techniques for dealing in statistics. These techniques have the
with missing data are also available. fication learning and clustering. There are advantageof allowingprior knowledge about
Finally, for exploratory data analysis, some several texts on the topic; Pattern Classifi- the domain and data to be included in a rel-
techniques in clustering and design of cation and Scene Analysis provides a good atively easy and natural framework. Other
experiments come into play. However, the start.6 Hence, most pattem-recognition work areas of AI, including knowledge-acquisition
focus of research has dealt primarily with contributes to the data-’mining step in the techniques, knowledge representation, and
small data sets and addressing small sam- process. Significant work in dimensionality search, are relevant to the various steps in the
ple problems. reduction, transformations, and projections process, including data mining, data trans-
On the limitations front, work in statistics has relevanceto the corresponding step in the formation, data selection, and preprocessing.
has focused mostly on theoretical aspects of KDD process.
techniques and models. Thus, most work Within the data-mining step, pattern- Databases and data warehouses. The rele-
focuses on linear models, additive Gaussian recognition contributions are distinguished vance of the field of databases to KDD is
noise models, parameter estimation, and from statistics by their emphasis on compu- obvious from the name. Databases provide
parametric methods for a fairly restricted tational algorithms, more sophisticated data the necessary infrastructure to store, access,
class of models. Search has received little structures, and more search, both parametric and manipulate the raw data. With parallel
emphasis, with emphasis on closed-form and nonparametric. Given its strong ties to and distributed database management sys-
analytical solutions whenever possible. image analysis and problems in 2D signal tems, they provide the essential layers to
While the latter is very desirable both com- processing, work in pattem recognition did insulate the analysis for the extensive details
putationally and theoretically, in many prac- not emphasize algorithms for dealing with of how the data is stored and retrieved. I
tical situations a user might not have the nec- symbolic and categorical data. Classification focus here only on the aspects of database
essary background statistics knowledge techniques applied to categorical data typi- research relevant to the data-mining step. A
(which can often be substantial) to appropri- cally take the approach of mapping the data strongly related term is on-line analytical
ately use and apply the methods. Further- to a metric space (such as nearest-neighbor processing,12 which mainly concerns pro-

24 IEEE EXPERT
viding new ways of manipulating and ana- and the efforts and community is rapidly bases,” in Advances in Knowledge Discovery
lyzing data using multidimensionalmethods. growing in size and resources. (See the and Data Mining, U. Fayyad, G. Piatetsky-
This has been primarily driven by the need “KDD community” sidebar on page 21 for a Shapiro, P. Smyth, R. Uthurusamy,eds., MIT
Press, 1996,pp. 83-116.
to overcome limitations posed by SQL and description of some of these efforts.) The
relational DBMS schemes for storing and main driver for this healthy growth has been, 6. R.O. Duda, and P.E. Hart, Pattern Classijica-
accessing data. The efficiencies achieved via and is likely to continue to be, the major tion and Scene Analysis, John Wiley & Sons,
relational structure and normalizations can social, economical, and scientific need for New York, 1973.
pose significantchallengesto algorithms that techniques that will enable us to make use
require special access to the data: in data of all the data we gather. The coupling of the 7. B.V. Dasarathy, Nearest Neighbor (NN)
Norms: NN Pattern Classification Tech-
mining, one would need to collect statistics two factors-the immediate needs and the niques, IEEE Computer Society Press, Los
and counts based on various partitionings of difficulty of the problems to be solved Alamitos, Calif., 1991.
the data, which would require excessive joins (requiring significant research and theoret-
and new tables to be generated. Supporting ical advances)-promises to keep the field 8. M. Kearns, Computational Learning Theory,
operations from the data-mining perspective on a healthy development path. Several new MIT Press, 1994.
is an emerging research area in the database issues and dilemmas are sure to arise. For
community. In the data-mining step itself, example, issues of privacy and legal consid- 9. C. Glymour et al., Discovering Causal Struc-
ture, Academic Press, New York, 1987.
new approaches for functional dependency eration of what data needs to be protected
analysis and efficient methods for finding from KDD-type access is already becoming 10. P. Besnard and S . Hanks, eds., Proc. 11th
association rules directly from databases a major issue, especially in Europe.14On the Con$ on Uncertainty in Artijicial Intelligence
have emerged and are starting to appear as other hand, to avoid the dramatic let-down (UAI-95),Morgan Kaufmann, San Francisco,
products. l4 In addition, classical database that follows false expectations, researchers, 1995.
techniques for query optimization and new practitioners, and vendors must carefully
11. D. Heckerman, “Bayesian Networks for
object-oriented databases make the task of avoid the hype and exaggerated claims about Knowledge Discovery,” in Advances in
searching for patterns in databases much what can be achieved. However, slowly and Knowledge Discovery and Data Mining,
more tenable. methodically, humanity needs to address the 1996, pp. 273-306.
An emerging area in databases is data deluge of data and databases.
warehousing, which is concerned with 12. E.F. Codd, “Providing OLAP (On-line Ana-
schemes and methods of integrating legacy lytical Processing)to User-Analysts: An IT
Mandate,” E.F. Codd and Assoc., 1993.
databases, on-line transaction databases, and
various nonhomogeneous RDBMSs so that 13. R. Agrawal et al., “Fast Discovery ofAssoci-
they can be accessed in a uniform and easily ation Rules,” inAdvances in Knowledge Dis-
managed framework. Data warehousing pri- covery and Data Mining, 1996, pp. 307-328
marily involves storage, data selection, data
References
cleaning, and infrastructure for updating 1. J. Way and E.A. Smith, “The Evolution of 14. G. Piatetsky-Shapiro,“KDD vs. Privacy: A
SyntheticAperture Radar Systemsand Their Minisymposium,”IEEE Expert, Vol 10, No.
databases once new knowledge or represen- 2, Apr., 1995,pp. 46-59.
tations are developed. Progression to the EOS SAR,” IEEE Trans.
Geoscience and Remote Sensing, Vol. 29, No.
6,1991, pp. 962-985.

2. U. Fayyad, G. Piatetsky-Shapiro, and P.


Smyth, “From Data Mining to Knowledge
Discovery: An Overview,” in Advances in
cant successes in this new field. Advances in Knowledge Discovery and Data Mining, U. Usama Fayyad is a senior researcher at Microsoft
Fayyad, G . Piatetsky-Shapiro,P. Smyth, R. Research.His researchinterestsincludeknowledge
Knowledge Discovery and Data Mining” Uthurusamy, eds., MIT Press, Cambridge, discovery in large databases,datamining,machine-
presents some illustrative examples (see also Mass., 1996,pp. 1-36. learning theory and applications, statistical pattem
the August 1995 issue of IEEE Expert, pp. recognition,and clustering.After receiving the PhD
10-13). However, work is just beginning in 3. U. Fayyad, S.G. Djorgovski, and N.Weir, degree in 1991 from the University of Michigan,
“Automatingthe Analysis and Cataloging of he joined the Jet PropulsionLaboratory at the Cal-
this challenging and exciting field. KDD ifornia Institute of Technology.At JPL, he headed
spans a spectrum of problems. Some are Sky Surveys,” in Advances in Knowledge Dis-
covery and Data Mining, U. Fayyad, G. the Machine Learning Systems Group, where he
practical challenges and await proper imple- Piatetsky-Shapiro,P. Smyth, R. Uthurusamy, developeddata-miningsystemsfor automated sci-
mentation, while others are fundamentally eds., MIT Press, 1996,pp. 471494. ence data analysis. He remains affiliated with JPL
difficult research problems that are at the as a Distinguished Visiting Scientist. He was pro-
gram cochair of KDD-94 and KDD-95 (the First
heart of many fields such as statistics, opti- 4. J. Kettenring and D. Pregibon, eds., Proc. InternationalConference on Knowledge Discov-
mization, search, pattern recognition, and Committee on Applied and Theoretical Sta- ery and Data Mining).He is generalchair of KDD-
mathematical modeling. tistics: Workshop on Massive Data Sets, 96, an editor-in-chief of Data Mining and Knowl-
National Research Council, Washington, edge Discovery, and co-editor of Advances in
While the current state of the art still relies D.C., to be published in 1996. Knowledge Discovery and Data Mining (MIT
on fairly simple approacheswith limited rep- Press). Contact him at Microsoft Research, One
resentation schemes and greedy heuristic 5. J. Elder and D. Pregibon,“A Statistical Per- Microsoft Way 9/S, Redmond,WA 98052-6399;
search, powerful results are being achieved, spective on Knowledge Discovery in Data- fayyad@microsoft.com.

OCTOBER 1996 25

Das könnte Ihnen auch gefallen