Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)
Comments
"If you have data that you want to analyze and understand, this book and the associated Weka toolkit are an excellent way to start."
-Jim Gray, Microsoft Research
Features
- Explains how data mining algorithms work.
- Helps you select appropriate approaches to particular problems
and to compare and evaluate the results of different techniques.
- Covers performance improvement techniques, including input
preprocessing and combining output from different methods.
- Shows you how to use the Weka machine learning workbench.
Translations
The book has been translated into German (first edition) and
Chinese (second edition).
Errata
Click here to get to a list of errata.
Teaching material
A .zip file with slides for the 2nd edition is located here.
It contains .pdf files as well as .odp files in Open Document Format
that were generated using OpenOffice 2.0. Note that there are several
free office programs now that can read .odp files. There is also a
plug-in for Word made by Sun for reading this format. Corresponding
information is on this
Wikipedia page.
Reviews of the first edition
Review by J. Geller (SIGMOD Record, Vol. 32:2, March 2002).
Review by E. Davis (AI Journal, Vol. 131:1-2, September 2001).
Review by P.A. Flach (AI Journal, Vol. 131:1-2, September 2001).
Table of Contents for the 2nd Edition:
Sections and chapters with new material are marked in red.
Preface
Part I: Practical Machine Learning Tools and Techniques
1. What’s it all about?
1.1 Data mining and machine learning
1.2 Simple examples: the weather problem and others
1.3 Fielded applications
1.4 Machine learning and statistics
1.5 Generalization as search
1.6 Data mining and ethics
1.7 Further reading
2. Input: Concepts, instances, attributes
2.1 What’s a concept?
2.2 What’s in an example?
2.3 What’s in an attribute?
2.4 Preparing the input
2.5 Further reading
3. Output: Knowledge representation
3.1 Decision tables
3.2 Decision trees
3.3 Classification rules
3.4 Association rules
3.5 Rules with exceptions
3.6 Rules involving relations
3.7 Trees for numeric prediction
3.8 Instance-based representation
3.9 Clusters
3.10 Further reading
4. Algorithms: The basic methods
4.1 Inferring rudimentary rules
4.2 Statistical modeling
4.3 Divide-and-conquer: constructing decision trees
4.4 Covering algorithms: constructing rules
4.5 Mining association rules
4.6 Linear models
4.7 Instance-based learning
4.8 Clustering
4.9 Further reading
5. Credibility: Evaluating what’s been learned
5.1 Training and testing
5.2 Predicting performance
5.3 Cross-validation
5.4 Other estimates
5.5 Comparing data mining schemes
5.6 Predicting probabilities
5.7 Counting the cost
5.8 Evaluating numeric prediction
5.9 The minimum description length principle
5.10 Applying MDL to clustering
5.11 Further reading
6. Implementations: Real machine learning schemes
6.1 Decision trees
6.2 Classification rules
6.3 Extending linear models
6.4 Instance-based learning
6.5 Numeric prediction
6.6 Clustering
6.7 Bayesian networks
7. Transformations: Engineering the input and output
7.1 Attribute selection
7.2 Discretizing numeric attributes
7.3 Some useful transformations
7.4 Automatic data cleansing
7.5 Combining multiple models
7.6 Using unlabeled data
7.7 Further reading
8. Moving on: Extensions and applications
8.1 Learning from massive datasets
8.2 Incorporating domain knowledge
8.3 Text and Web mining
8.4 Adversarial situations
8.5 Ubiquitous data mining
8.6 Further reading
Part II: The Weka machine learning workbench
9. Introduction to Weka
9.1 What’s in Weka?
9.2 How do you use it?
9.3 What else can you do?
10. The Explorer
10.1 Getting started
10.2 Exploring the Explorer
10.3 Filtering algorithms
10.4 Learning algorithms
10.5 Meta-learning algorithms
10.6 Clustering algorithms
10.7 Association-rule learners
10.8 Attribute selection
11. The Knowledge Flow interface
11.1 Getting started
11.2 Knowledge Flow components
11.3 Configuring and connecting the components
11.4 Incremental learning
12. The Experimenter
12.1 Getting started
12.2 Simple setup
12.3 Advanced setup
12.4 The Analyze panel
12.5 Distributing processing over several machines
13. The command-line interface
13.1 Getting started
13.2 The structure of Weka
13.3 Command-line options
14. Embedded machine learning
15. Writing new learning schemes
References
Index
|