milk.classifiers
Class MIEvaluation

java.lang.Object
  |
  +--milk.classifiers.MIEvaluation
All Implemented Interfaces:
weka.core.Summarizable

public class MIEvaluation
extends java.lang.Object
implements weka.core.Summarizable

Class for evaluating machine learning models.

General options when evaluating a learning scheme from the command-line:

-t filename
Name of the file with the training data. (required)

-T filename
Name of the file with the test data. If missing a cross-validation is performed.

-c index
Index of the class attribute (1, 2, ...; default: last).

-I index
Index of the ID attribute (0, 1, 2, ...; default: first).

-x number
The number of folds for the cross-validation (default: 10).

-s seed
Random number seed for the cross-validation (default: 1).

-m filename
The name of a file containing a cost matrix.

-l filename
Loads classifier from the given file.

-g
Only for classifiers that implement "Graphable." Outputs the graph representation of the classifier (and nothing else).

-L
Whether use "Leave-One-Out" cross-validation.

-d filename
Saves classifier built from the training data into the given file.

-v
Outputs no statistics for the training data.

-o
Outputs statistics only, not the classifier.


Constructor Summary
MIEvaluation(Exemplars data)
          Initializes all the counters for the evaluation.
MIEvaluation(Exemplars data, weka.classifiers.CostMatrix costMatrix)
          Initializes all the counters for the evaluation and also takes a cost matrix as parameter.
 
Method Summary
 double avgCost()
          Gets the average cost, that is, total cost of misclassifications (incorrect plus unclassified) over the total number of instances.
 double[][] confusionMatrix()
          Returns a copy of the confusion matrix.
 double correct()
          Gets the number of instances correctly classified (that is, for which a correct prediction was made).
 void crossValidateModel(MIClassifier classifier, Exemplars data, int numFolds)
          Performs a (stratified if class is nominal) cross-validation for a classifier on a set of exemplars.
 void crossValidateModel(java.lang.String classifierString, Exemplars data, int numFolds, java.lang.String[] options)
          Performs a (stratified if class is nominal) cross-validation for a classifier on a set of exemplars.
 double errorRate()
          Returns the estimated error rate or the root mean squared error (if the class is numeric).
 void evaluateModel(MIClassifier classifier, Exemplars data)
          Evaluates the classifier on a given set of exemplars.
static java.lang.String evaluateModel(MIClassifier classifier, java.lang.String[] options)
          Evaluates a classifier with the options given in an array of strings.
static java.lang.String evaluateModel(java.lang.String classifierString, java.lang.String[] options)
          Evaluates a classifier with the options given in an array of strings.
 double evaluateModelOnce(MIClassifier classifier, Exemplar test)
          Evaluates the classifier on a single exemplar.
 double falseNegativeRate(int classIndex)
          Calculate the false negative rate with respect to a particular class.
 double falsePositiveRate(int classIndex)
          Calculate the false positive rate with respect to a particular class.
 double fMeasure(int classIndex)
          Calculate the F-Measure with respect to a particular class.
 double incorrect()
          Gets the number of instances incorrectly classified (that is, for which an incorrect prediction was made).
 double kappa()
          Returns value of kappa statistic if class is nominal.
static void main(java.lang.String[] args)
          A test method for this class.
 double meanAbsoluteError()
          Returns the mean absolute error.
 double meanPriorAbsoluteError()
          Returns the mean absolute error of the prior.
 double numExemplars()
          Gets the number of test exemplars that had a known class value (actually the sum of the weights of test exemplars with known class value).
 double numFalseNegatives(int classIndex)
          Calculate number of false negatives with respect to a particular class.
 double numFalsePositives(int classIndex)
          Calculate number of false positives with respect to a particular class.
 double numTrueNegatives(int classIndex)
          Calculate the number of true negatives with respect to a particular class.
 double numTruePositives(int classIndex)
          Calculate the number of true positives with respect to a particular class.
 double pctCorrect()
          Gets the percentage of instances correctly classified (that is, for which a correct prediction was made).
 double pctIncorrect()
          Gets the percentage of instances incorrectly classified (that is, for which an incorrect prediction was made).
 double pctUnclassified()
          Gets the percentage of instances not classified (that is, for which no prediction was made by the classifier).
 double precision(int classIndex)
          Calculate the precision with respect to a particular class.
 double recall(int classIndex)
          Calculate the recall with respect to a particular class.
 double relativeAbsoluteError()
          Returns the relative absolute error.
 double rootMeanPriorSquaredError()
          Returns the root mean prior squared error.
 double rootMeanSquaredError()
          Returns the root mean squared error.
 double rootRelativeSquaredError()
          Returns the root relative squared error if the class is numeric.
 void setPriors(Exemplars train)
          Sets the class prior probabilities
 java.lang.String toClassDetailsString()
           
 java.lang.String toClassDetailsString(java.lang.String title)
          Generates a breakdown of the accuracy for each class, incorporating various information-retrieval statistics, such as true/false positive rate, precision/recall/F-Measure.
 java.lang.String toMatrixString()
          Calls toMatrixString() with a default title.
 java.lang.String toMatrixString(java.lang.String title)
          Outputs the performance statistics as a classification confusion matrix.
 java.lang.String toSummaryString()
          Calls toSummaryString() with no title and no complexity stats
 java.lang.String toSummaryString(boolean printComplexityStatistics)
          Calls toSummaryString() with a default title.
 java.lang.String toSummaryString(java.lang.String title, boolean printComplexityStatistics)
          Outputs the performance statistics in summary form.
 double totalCost()
          Gets the total cost, that is, the cost of each prediction times the weight of the instance, summed over all instances.
 double trueNegativeRate(int classIndex)
          Calculate the true negative rate with respect to a particular class.
 double truePositiveRate(int classIndex)
          Calculate the true positive rate with respect to a particular class.
 double unclassified()
          Gets the number of instances not classified (that is, for which no prediction was made by the classifier).
 void updatePriors(Exemplar example)
          Updates the class prior probabilities (when incrementally training)
protected static java.lang.String wekaStaticWrapper(weka.classifiers.Sourcable classifier, java.lang.String className)
          Wraps a static classifier in enough source to test using the weka class libraries.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MIEvaluation

public MIEvaluation(Exemplars data)
             throws java.lang.Exception
Initializes all the counters for the evaluation.

Parameters:
data - set of training exemplars, to get some header information and prior class distribution information
Throws:
java.lang.Exception - if the class is not defined

MIEvaluation

public MIEvaluation(Exemplars data,
                    weka.classifiers.CostMatrix costMatrix)
             throws java.lang.Exception
Initializes all the counters for the evaluation and also takes a cost matrix as parameter.

Parameters:
data - set of exemplars, to get some header information
costMatrix - the cost matrix---if null, default costs will be used
Throws:
java.lang.Exception - if cost matrix is not compatible with data, the class is not defined or the class is numeric
Method Detail

confusionMatrix

public double[][] confusionMatrix()
Returns a copy of the confusion matrix.

Returns:
a copy of the confusion matrix as a two-dimensional array

crossValidateModel

public void crossValidateModel(MIClassifier classifier,
                               Exemplars data,
                               int numFolds)
                        throws java.lang.Exception
Performs a (stratified if class is nominal) cross-validation for a classifier on a set of exemplars.

Parameters:
classifier - the classifier with any options set.
data - the data on which the cross-validation is to be performed
numFolds - the number of folds for the cross-validation
Throws:
java.lang.Exception - if a classifier could not be generated successfully or the class is not defined

crossValidateModel

public void crossValidateModel(java.lang.String classifierString,
                               Exemplars data,
                               int numFolds,
                               java.lang.String[] options)
                        throws java.lang.Exception
Performs a (stratified if class is nominal) cross-validation for a classifier on a set of exemplars.

Parameters:
data - the data on which the cross-validation is to be performed
numFolds - the number of folds for the cross-validation
options - the options to the classifier. Any options accepted by the classifier will be removed from this array.
Throws:
java.lang.Exception - if a classifier could not be generated successfully or the class is not defined

evaluateModel

public static java.lang.String evaluateModel(java.lang.String classifierString,
                                             java.lang.String[] options)
                                      throws java.lang.Exception
Evaluates a classifier with the options given in an array of strings.

Valid options are:

-t filename
Name of the file with the training data. (required)

-T filename
Name of the file with the test data. If missing a cross-validation is performed.

-c index
Index of the class attribute (1, 2, ...; default: last).

-I index
Index of the ID attribute (0, 1, 2, ...; default: first).

-x number
The number of folds for the cross-validation (default: 10).

-s seed
Random number seed for the cross-validation (default: 1).

-m filename
The name of a file containing a cost matrix.

-l filename
Loads classifier from the given file.

-g
Only for classifiers that implement "Graphable." Outputs the graph representation of the classifier (and nothing else).

-L
Whether use "Leave-One-Out" cross-validation.

-d filename
Saves classifier built from the training data into the given file.

-v
Outputs no statistics for the training data.

-o
Outputs statistics only, not the classifier.

Parameters:
classifierString - class of machine learning classifier as a string
options - the array of string containing the options
Returns:
a string describing the results
Throws:
java.lang.Exception - if model could not be evaluated successfully

main

public static void main(java.lang.String[] args)
A test method for this class. Just extracts the first command line argument as a classifier class name and calls evaluateModel.

Parameters:
args - an array of command line arguments, the first of which must be the class name of a classifier.

evaluateModel

public static java.lang.String evaluateModel(MIClassifier classifier,
                                             java.lang.String[] options)
                                      throws java.lang.Exception
Evaluates a classifier with the options given in an array of strings.

Valid options are:

-t filename
Name of the file with the training data. (required)

-T filename
Name of the file with the test data. If missing a cross-validation is performed.

-c index
Index of the class attribute (1, 2, ...; default: last).

-I index
Index of the ID attribute (0, 1, 2, ...; default: first).

-x number
The number of folds for the cross-validation (default: 10).

-s seed
Random number seed for the cross-validation (default: 1).

-m filename
The name of a file containing a cost matrix.

-l filename
Loads classifier from the given file.

-g
Only for classifiers that implement "Graphable." Outputs the graph representation of the classifier (and nothing else).

-L
Whether use "Leave-One-Out" cross-validation.

-d filename
Saves classifier built from the training data into the given file.

-v
Outputs no statistics for the training data.

-o
Outputs statistics only, not the classifier.

Parameters:
classifier - machine learning classifier
options - the array of string containing the options
Returns:
a string describing the results
Throws:
java.lang.Exception - if model could not be evaluated successfully

evaluateModel

public void evaluateModel(MIClassifier classifier,
                          Exemplars data)
                   throws java.lang.Exception
Evaluates the classifier on a given set of exemplars.

Parameters:
classifier - machine learning classifier
data - set of test exemplars for evaluation
Throws:
java.lang.Exception - if model could not be evaluated successfully

evaluateModelOnce

public double evaluateModelOnce(MIClassifier classifier,
                                Exemplar test)
                         throws java.lang.Exception
Evaluates the classifier on a single exemplar.

Parameters:
classifier - machine learning classifier
test - the test exemplar to be classified
Returns:
the prediction made by the classifier
Throws:
java.lang.Exception - if model could not be evaluated successfully or the data contains string attributes

wekaStaticWrapper

protected static java.lang.String wekaStaticWrapper(weka.classifiers.Sourcable classifier,
                                                    java.lang.String className)
                                             throws java.lang.Exception
Wraps a static classifier in enough source to test using the weka class libraries.

Parameters:
classifier - a Sourcable Classifier
className - the name to give to the source code class
Returns:
the source for a static classifier that can be tested with weka libraries.
java.lang.Exception

numExemplars

public final double numExemplars()
Gets the number of test exemplars that had a known class value (actually the sum of the weights of test exemplars with known class value).

Returns:
the number of test exemplars with known class

incorrect

public final double incorrect()
Gets the number of instances incorrectly classified (that is, for which an incorrect prediction was made). (Actually the sum of the weights of these instances)

Returns:
the number of incorrectly classified instances

pctIncorrect

public final double pctIncorrect()
Gets the percentage of instances incorrectly classified (that is, for which an incorrect prediction was made).

Returns:
the percent of incorrectly classified instances (between 0 and 100)

totalCost

public final double totalCost()
Gets the total cost, that is, the cost of each prediction times the weight of the instance, summed over all instances.

Returns:
the total cost

avgCost

public final double avgCost()
Gets the average cost, that is, total cost of misclassifications (incorrect plus unclassified) over the total number of instances.

Returns:
the average cost.

correct

public final double correct()
Gets the number of instances correctly classified (that is, for which a correct prediction was made). (Actually the sum of the weights of these instances)

Returns:
the number of correctly classified instances

pctCorrect

public final double pctCorrect()
Gets the percentage of instances correctly classified (that is, for which a correct prediction was made).

Returns:
the percent of correctly classified instances (between 0 and 100)

unclassified

public final double unclassified()
Gets the number of instances not classified (that is, for which no prediction was made by the classifier). (Actually the sum of the weights of these instances)

Returns:
the number of unclassified instances

pctUnclassified

public final double pctUnclassified()
Gets the percentage of instances not classified (that is, for which no prediction was made by the classifier).

Returns:
the percent of unclassified instances (between 0 and 100)

errorRate

public final double errorRate()
Returns the estimated error rate or the root mean squared error (if the class is numeric). If a cost matrix was given this error rate gives the average cost.

Returns:
the estimated error rate (between 0 and 1, or between 0 and maximum cost)

kappa

public final double kappa()
Returns value of kappa statistic if class is nominal.

Returns:
the value of the kappa statistic

meanAbsoluteError

public final double meanAbsoluteError()
Returns the mean absolute error. Refers to the error of the predicted values for numeric classes, and the error of the predicted probability distribution for nominal classes.

Returns:
the mean absolute error

meanPriorAbsoluteError

public final double meanPriorAbsoluteError()
Returns the mean absolute error of the prior.

Returns:
the mean absolute error

relativeAbsoluteError

public final double relativeAbsoluteError()
                                   throws java.lang.Exception
Returns the relative absolute error.

Returns:
the relative absolute error
Throws:
java.lang.Exception - if it can't be computed

rootMeanSquaredError

public final double rootMeanSquaredError()
Returns the root mean squared error.

Returns:
the root mean squared error

rootMeanPriorSquaredError

public final double rootMeanPriorSquaredError()
Returns the root mean prior squared error.

Returns:
the root mean prior squared error

rootRelativeSquaredError

public final double rootRelativeSquaredError()
Returns the root relative squared error if the class is numeric.

Returns:
the root relative squared error

toSummaryString

public java.lang.String toSummaryString()
Calls toSummaryString() with no title and no complexity stats

Specified by:
toSummaryString in interface weka.core.Summarizable
Returns:
a summary description of the classifier evaluation

toSummaryString

public java.lang.String toSummaryString(boolean printComplexityStatistics)
Calls toSummaryString() with a default title.

Parameters:
printComplexityStatistics - if true, complexity statistics are returned as well

toSummaryString

public java.lang.String toSummaryString(java.lang.String title,
                                        boolean printComplexityStatistics)
Outputs the performance statistics in summary form. Lists number (and percentage) of instances classified correctly, incorrectly and unclassified. Outputs the total number of instances classified, and the number of instances (if any) that had no class value provided.

Parameters:
title - the title for the statistics
printComplexityStatistics - if true, complexity statistics are returned as well
Returns:
the summary as a String

toMatrixString

public java.lang.String toMatrixString()
                                throws java.lang.Exception
Calls toMatrixString() with a default title.

Returns:
the confusion matrix as a string
Throws:
java.lang.Exception - if the class is numeric

toMatrixString

public java.lang.String toMatrixString(java.lang.String title)
                                throws java.lang.Exception
Outputs the performance statistics as a classification confusion matrix. For each class value, shows the distribution of predicted class values.

Parameters:
title - the title for the confusion matrix
Returns:
the confusion matrix as a String
Throws:
java.lang.Exception - if the class is numeric

toClassDetailsString

public java.lang.String toClassDetailsString()
                                      throws java.lang.Exception
java.lang.Exception

toClassDetailsString

public java.lang.String toClassDetailsString(java.lang.String title)
                                      throws java.lang.Exception
Generates a breakdown of the accuracy for each class, incorporating various information-retrieval statistics, such as true/false positive rate, precision/recall/F-Measure. Should be useful for ROC curves, recall/precision curves.

Parameters:
title - the title to prepend the stats string with
Returns:
the statistics presented as a string
java.lang.Exception

numTruePositives

public double numTruePositives(int classIndex)
Calculate the number of true positives with respect to a particular class. This is defined as

 correctly classified positives
 

Parameters:
classIndex - the index of the class to consider as "positive"
Returns:
the true positive rate

truePositiveRate

public double truePositiveRate(int classIndex)
Calculate the true positive rate with respect to a particular class. This is defined as

 correctly classified positives
 ------------------------------
       total positives
 

Parameters:
classIndex - the index of the class to consider as "positive"
Returns:
the true positive rate

numTrueNegatives

public double numTrueNegatives(int classIndex)
Calculate the number of true negatives with respect to a particular class. This is defined as

 correctly classified negatives
 

Parameters:
classIndex - the index of the class to consider as "positive"
Returns:
the true positive rate

trueNegativeRate

public double trueNegativeRate(int classIndex)
Calculate the true negative rate with respect to a particular class. This is defined as

 correctly classified negatives
 ------------------------------
       total negatives
 

Parameters:
classIndex - the index of the class to consider as "positive"
Returns:
the true positive rate

numFalsePositives

public double numFalsePositives(int classIndex)
Calculate number of false positives with respect to a particular class. This is defined as

 incorrectly classified negatives
 

Parameters:
classIndex - the index of the class to consider as "positive"
Returns:
the false positive rate

falsePositiveRate

public double falsePositiveRate(int classIndex)
Calculate the false positive rate with respect to a particular class. This is defined as

 incorrectly classified negatives
 --------------------------------
        total negatives
 

Parameters:
classIndex - the index of the class to consider as "positive"
Returns:
the false positive rate

numFalseNegatives

public double numFalseNegatives(int classIndex)
Calculate number of false negatives with respect to a particular class. This is defined as

 incorrectly classified positives
 

Parameters:
classIndex - the index of the class to consider as "positive"
Returns:
the false positive rate

falseNegativeRate

public double falseNegativeRate(int classIndex)
Calculate the false negative rate with respect to a particular class. This is defined as

 incorrectly classified positives
 --------------------------------
        total positives
 

Parameters:
classIndex - the index of the class to consider as "positive"
Returns:
the false positive rate

recall

public double recall(int classIndex)
Calculate the recall with respect to a particular class. This is defined as

 correctly classified positives
 ------------------------------
       total positives
 

(Which is also the same as the truePositiveRate.)

Parameters:
classIndex - the index of the class to consider as "positive"
Returns:
the recall

precision

public double precision(int classIndex)
Calculate the precision with respect to a particular class. This is defined as

 correctly classified positives
 ------------------------------
  total predicted as positive
 

Parameters:
classIndex - the index of the class to consider as "positive"
Returns:
the precision

fMeasure

public double fMeasure(int classIndex)
Calculate the F-Measure with respect to a particular class. This is defined as

 2 * recall * precision
 ----------------------
   recall + precision
 

Parameters:
classIndex - the index of the class to consider as "positive"
Returns:
the F-Measure

setPriors

public void setPriors(Exemplars train)
               throws java.lang.Exception
Sets the class prior probabilities

Parameters:
train - the training exemplars used to determine the prior probabilities
Throws:
java.lang.Exception - if the class attribute of the exemplars is not set

updatePriors

public void updatePriors(Exemplar example)
                  throws java.lang.Exception
Updates the class prior probabilities (when incrementally training)

Parameters:
example - the new training example seen
Throws:
java.lang.Exception - if the class of the example is not set