Sie sind auf Seite 1von 15

Data Mining: Concepts and Techniques By Jiawei Han, Micheline Kamber, Jian Pei http://books.google.com.mx/books?hl=en&lr=&id=pQws07tdpjoC&oi=fnd&pg=PP2&dq=data+min ing&ots=txFzYUpy-Z&sig=gnsQYkKmr6XHZu0m9eZWkQUWsw&redir_esc=y#v=onepage&q=data%20mining&f=false .

From Data Mining to Knowledge Discovery in Databases (From Data Mining to 01) Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth http://www.aaai.org/ojs/index.php/aimagazine/article/view/1230/1131 weka http://weka.wikispaces.com/ ACM SIGKDD Explorations Newsletter Volume 11 Issue 1, June 2009 SPECIAL ISSUE: Open source analytics Open source analytics: an introduction to the special issue Robert L. Grossman Pages: 3-4 doi>10.1145/1656274.1656276 Full text: Pdf

This special issue contains six articles on open source analytics. It includes an article describing the Weka data mining system, two articles on infrastructure to support analytics, an article on the PMML standard for statistical and data mining models, an article on how clouds are being used in analytics, and an article about an open source tool for cleaning data.

http://dl.acm.org/citation.cfm?id=1656278 Data mining: an overview from a database perspective

Ming-Syan Chen; Jiawei Han; Yu, P.S.; Dept. of Electr. Eng., Nat. Taiwan Univ., Taipei

This paper appears in: Knowledge and Data Engineering, IEEE Transactions on Issue Date: Dec 1996 Volume: 8 Issue:6 On page(s): 866 - 883 ISSN: 1041-4347 References Cited: 87 Cited by : 211 INSPEC Accession Number: 5476684 Digital Object Identifier: 10.1109/69.553155 Date of Current Version: 06 agosto 2002 Sponsored by: IEEE Computer Society . Principles of data mining By D. J. Hand, Heikki Mannila, Padhraic Smyth http://books.google.com.mx/books?hl=en&lr=&id=SdZbhVhZGYC&oi=fnd&pg=PR17&dq=data+mining&ots=ywMauloll&sig=Jc4BGoC228ZRW3Zc8PV6ozv6FRk&redir_esc=y#v=onepage&q=data%20mining&f=false The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different

disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics.

The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing. . http://dl.acm.org/citation.cfm?id=551917 Machine Learning and Data Mining; Methods and Applications Editors: Ryszard S. Michalski Ivan Bratko Avan Bratko Machine Learning and Data Mining; Methods and Applications John Wiley & Sons, Inc. New York, NY, USA 1998 ISBN:0471971995 From the Publisher: Master the new computational tools to get the most out of your information system. This practical guide, the first to clearly outline the situation for the benefit of engineers and scientists, provides a straightforward introduction to basic machine learning and data mining methods, covering the analysis of numerical, text, and sound data. http://dl.acm.org/citation.cfm?id=231007 Data mining with neural networks: solving business problems from application development to decision support

Author: Joseph P. Bigus IBM, Rochester, MN Publication: Book Data mining with neural networks: solving business problems from application development to decision support McGraw-Hill, Inc. Hightstown, NJ, USA 1996 ISBN:0-07-005779-6 . http://dl.acm.org/citation.cfm?id=1150531 YALE: rapid prototyping for complex data mining tasks Authors: Ingo Mierswa University of Dortmund

Michael Wurst University of Dortmund Ralf Klinkenberg University of Dortmund

Martin Scholz University of Dortmund Timm Euler University of Dortmund

2006 Article Published in: Proceeding KDD '06 Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining ACM New York, NY, USA 2006 ISBN:1-59593-339-5 doi>10.1145/1150402.1150531 KDD is a complex and demanding task. While a large number of methods has been established for numerous problems, many challenges remain to be solved. New tasks emerge requiring the development of new methods or processing schemes. Like in software development, the development of such solutions demands for careful analysis, specification, implementation, and testing. Rapid prototyping is an approach which allows crucial design decisions as early as possible. A rapid prototyping system should support maximal re-use and innovative combinations of existing methods, as well as simple and quick integration of new ones.This paper describes Yale, a free open-source environment forKDD and machine learning. Yale provides a rich variety of

methods whichallows rapid prototyping for new applications and makes costlyre-implementations unnecessary. Additionally, Yale offers extensive functionality for process evaluation and optimization which is a crucial property for any KDD rapid prototyping tool. Following the paradigm of visual programming eases the design of processing schemes. While the graphical user interface supports interactive design, the underlying XML representation enables automated applications after the prototyping phase.After a discussion of the key concepts of Yale, we illustrate the advantages of rapid prototyping for KDD on case studies ranging from data preprocessing to result visualization. These case studies cover tasks like feature engineering, text mining, data stream mining and tracking drifting concepts, ensemble methods and distributed data mining. This variety of applications is also reflected in a broad user base, we counted more than 40,000 downloads during the last twelve months. . http://books.google.com.mx/books?hl=en&lr=&id=wufB6fwaYH4C&oi=fnd&pg=PR17&dq=data+m ining&ots=k34QsQGf75&sig=LVc9cgC37HknysfO_0fxfyRCj3c&redir_esc=y#v=onepage&q=data%20 mining&f=false Data preparation for data mining, Volume 1 By Dorian Pyle http://bioinformatics.oxfordjournals.org/content/20/15/2479.short Data mining in bioinformatics using Weka Eibe Frank1,*, Mark Hall1, Len Trigg2, Geoffrey Holmes1 and Ian H. Witten1 + Author Affiliations

1Department of Computer Science, University of Waikato, Private Bag 3105, Hamilton, New Zealand and 2Reel Two, PO Box 1538, Hamilton, New Zealand Contact: eibe@cs.waikato.ac.nz Received December 3, 2003. Accepted February 26, 2004. Revision received February 3, 2004. Abstract

Summary: The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selectioncommon data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it. . http://ir.iit.edu/~dagr/DataMiningCourse/Spring2001/ReadingsForClass/dmql.pdf DMQL: A Data Mining Query Language for Relational Databases https://springerlink3.metapress.com/content/g58613yv08bx48qj/resourcesecured/?target=fulltext.pdf&sid=prbm30bojvkxyxlzzvouboif&sh=www.springerlink.com KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2004 Lecture Notes in Computer Science, 2004, Volume 3202/2004, 537-539, DOI: 10.1007/978-3-54030116-5_58 Orange: From Experimental Machine Learning to Interactive Data Mining Janez Demar, Bla Zupan, Gregor Leban and Tomaz Curk .. Benchmarking attribute selection techniques for discrete class data mining Hall, M.A.; Holmes, G.; Dept. of Comput. Sci., Waikato Univ., Hamilton, New Zealand

This paper appears in: Knowledge and Data Engineering, IEEE Transactions on Issue Date: Nov.-Dec. 2003 Volume: 15 Issue:6 On page(s): 1437 - 1447

ISSN: 1041-4347 References Cited: 16 Cited by : 65 INSPEC Accession Number: 7959848 Digital Object Identifier: 10.1109/TKDE.2003.1245283 Date of Current Version: 17 noviembre 2003 Sponsored by: IEEE Computer Society ABSTRACT Data engineering is generally considered to be a central issue in the development of data mining applications. The success of many learning schemes, in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant, and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. Attribute selection generally involves a combination of search and attribute utility estimation plus evaluation with respect to specific learning schemes. This leads to a large number of possible permutations and has led to a situation where very few benchmark studies have been conducted. This paper presents a benchmark comparison of several attribute selection methods for supervised classification. All the methods produce an attribute ranking, a useful devise for isolating the individual merit of an attribute. Attribute selection is achieved by cross-validating the attribute rankings with respect to a classification learner to find the best attributes. Results are reported for a selection of standard data sets and two diverse learning schemes C4.5 and naive Bayes. . Effective data mining using neural networks Hongjun Lu; Setiono, R.; Huan Liu; Dept. of Inf. Syst. & Comput. Sci., Nat. Univ. of Singapore This paper appears in: Knowledge and Data Engineering, IEEE Transactions on Issue Date: Dec 1996 Volume: 8 Issue:6 On page(s): 957 - 961

ISSN: 1041-4347 References Cited: 10 Cited by : 42 INSPEC Accession Number: 5476692 Digital Object Identifier: 10.1109/69.553163 Date of Current Version: 06 agosto 2002 Sponsored by: IEEE Computer Society ABSTRACT Classification is one of the data mining problems receiving great attention recently in the database community. The paper presents an approach to discover symbolic classification rules using neural networks. Neural networks have not been thought suited for data mining because how the classifications were made is not explicitly stated as symbolic rules that are suitable for verification or interpretation by humans. With the proposed approach, concise symbolic rules with high accuracy can be extracted from a neural network. The network is first trained to achieve the required accuracy rate. Redundant connections of the network are then removed by a network pruning algorithm. The activation values of the hidden units in the network are analyzed, and classification rules are generated using the result of this analysis. The effectiveness of the proposed approach is clearly demonstrated by the experimental results on a set of standard data mining test problems . DM01 http://150.214.190.154/gfs/pdf/2004_Ishibuchi-H._Fuzzy-Sets-Syst.pdf http://www.sciencedirect.com/science/article/pii/S0165011403001143 .. http://student.bus.olemiss.edu/files/Conlon/Others/Temp/Temp_old/MachineLearning_DataMini ng/Knowledge%20management%20and%20data%20mining%20for%20marketing.pdf DM02 Knowledge management and data mining for marketing Michael J. Shaw Chandrasekar Subramaniam

Gek Woo Tan Michael E. Welge http://www.sigkdd.org/explorations/issues/1-1-1999-06/survey.pdf DM03 A SURVEY OF DATA MINING AND KNOWLEDGE DISCOVERY SOFTWARE TOOL Michael Goebel Le Gruenwald .. DMMM larose Data Mining Methods and Models By Daniel T. Larose, Ph.D. .. 10Algorithms-08 http://www.cs.umd.edu/~samir/498/10Algorithms-08.pdf Top 10 algorithms in data mining Xindong Wu Vipin Kumar J. Ross Quinlan Joydeep Ghosh Qiang Yang Hiroshi Motoda Geoffrey J. McLachlan Angus Ng Bing Liu Philip S. Yu Zhi-Hua Zhou Michael Steinbach David J. Hand Dan Steinber .. 10.1.1.27.9003 Data mining using ++ a machine learning library in C++ Kohavi, R.; Sommerfield, D.; Dougherty, J.; Silicon Graphics Comput. Syst., Mountain View, CA, USA (weka)

This paper appears in: Tools with Artificial Intelligence, 1996., Proceedings Eighth IEEE International Conference on Issue Date: 16-19 Nov. 1996 On page(s): 234 - 245 ISSN: 1082-3409 Print ISBN: 0-8186-7686-7 Cited by : 6 INSPEC Accession Number: 5437399 Digital Object Identifier: 10.1109/TAI.1996.560457 Date of Current Version: 06 agosto 2002 ABSTRACT Data mining algorithms including machine learning, statistical analysis, and pattern recognition techniques can greatly improve our understanding of data warehouses that are now becoming more widespread. In this paper, we focus on classification algorithms and review the need for multiple classification algorithms. We describe a system called L++ which was designed to help choose the appropriate classification algorithm for a given dataset by making it easy to compare the utility of different algorithms on a specific dataset of interest. L++ not only provides a work-bench for such comparisons, but also provides a library of C++ classes to aid in the development of new algorithms, especially hybrid algorithms and multi-strategy algorithms. Such algorithms are generally hard to code from scratch. We discuss design issues, interfaces to other programs, and visualization of the resulting classifiers. . NnGPcomparison http://www.cpdee.ufmg.br/~joao/CE/ArtigosProgGen/NnGPcomparison.pdf A Comparison of Linear Genetic Programming and Neural Networks in Medical Data Mining Markus Brameier and Wolfgang Banzha This paper appears in: Evolutionary Computation, IEEE Transactions on Issue Date: Feb 2001 Volume: 5 Issue:1

On page(s): 17 - 26 ISSN: 1089-778X References Cited: 29 Cited by : 70 INSPEC Accession Number: 6876106 Digital Object Identifier: 10.1109/4235.910462 Date of Current Version: 07 agosto 2002 Sponsored by: IEEE Computational Intelligence Society ABSTRACT We introduce a new form of linear genetic programming (GP). Two methods of acceleration of our GP approach are discussed: 1) an efficient algorithm that eliminates intron code and 2) a demetic approach to virtually parallelize the system on a single processor. Acceleration of runtime is especially important when operating with complex data sets, because they are occurring in realworld applications. We compare GP performance on medical classification problems from a benchmark database with results obtained by neural networks. Our results show that GP performs comparably in classification and generalization .. COMPUTER SCIENCE DATABASE THEORY ICDT '97 Lecture Notes in Computer Science, 1997, Volume 1186/1997, 41-55, DOI: 10.1007/3-540-622225_35 Methods and problems in data mining Heikki Mannila Abstract Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We consider some methods used in data mining, concentrating on levelwise search for all frequently occurring patterns. We show how this technique can be used in various applications. We also discuss possibilities for compiling data mining queries into algorithms, and look at the use of sampling in data mining. We conclude by listing several open research problems in data mining and knowledge discovery.

Part of this work was done while the author was visiting the Max Planck Institut fr Informatik in Saarbrcken, Germany. Work supported by the Academy of Finland and by the Alexander von Humboldt Stiftung. .. DM05 http://media.wiley.com/product_data/excerpt/47/04712538/0471253847.pdf .. DM06 Using Neural Networks for Data Mining Mark W Craven Jude W Shavli http://www.cs.iastate.edu/~honavar/nn7.pdf

http://www.sciencedirect.com/science/article/pii/S0167739X97000228 . DM07 http://angsila.cs.buu.ac.th/~50036429/Datamining/paper2/b-1.pdf

Das könnte Ihnen auch gefallen