MOA 12.03
Real Time Analytics for Data Streams
moa.classifiers.trees.HoeffdingTree Class Reference

Hoeffding Tree or VFDT. More...

Inheritance diagram for moa.classifiers.trees.HoeffdingTree:
Collaboration diagram for moa.classifiers.trees.HoeffdingTree:

List of all members.

Classes

class  ActiveLearningNode
class  FoundNode
class  InactiveLearningNode
class  LearningNode
class  LearningNodeNB
class  LearningNodeNBAdaptive
class  Node
class  SplitNode

Public Member Functions

String getPurposeString ()
 Gets the purpose of this object.
int calcByteSize ()
int measureByteSize ()
 Gets the memory size of this object.
void resetLearningImpl ()
 Resets this classifier.
void trainOnInstanceImpl (Instance inst)
 Trains this classifier incrementally using the given instance.
double[] getVotesForInstance (Instance inst)
 Predicts the class memberships for a given instance.
int measureTreeDepth ()
void getModelDescription (StringBuilder out, int indent)
 Returns a string representation of the model.
boolean isRandomizable ()
 Gets whether this classifier needs a random seed.
void enforceTrackerLimit ()
void estimateModelByteSizes ()
void deactivateAllLeaves ()

Static Public Member Functions

static double computeHoeffdingBound (double range, double confidence, double n)

Public Attributes

IntOption maxByteSizeOption
ClassOption numericEstimatorOption
ClassOption nominalEstimatorOption
IntOption memoryEstimatePeriodOption
IntOption gracePeriodOption
ClassOption splitCriterionOption
FloatOption splitConfidenceOption
FloatOption tieThresholdOption
FlagOption binarySplitsOption
FlagOption stopMemManagementOption
FlagOption removePoorAttsOption
FlagOption noPrePruneOption
MultiChoiceOption leafpredictionOption
IntOption nbThresholdOption

Protected Member Functions

Measurement[] getModelMeasurementsImpl ()
 Gets the current measurements of this classifier.
SplitNode newSplitNode (InstanceConditionalTest splitTest, double[] classObservations)
AttributeClassObserver newNominalClassObserver ()
AttributeClassObserver newNumericClassObserver ()
void attemptToSplit (ActiveLearningNode node, SplitNode parent, int parentIndex)
void deactivateLearningNode (ActiveLearningNode toDeactivate, SplitNode parent, int parentBranch)
void activateLearningNode (InactiveLearningNode toActivate, SplitNode parent, int parentBranch)
FoundNode[] findLearningNodes ()
void findLearningNodes (Node node, SplitNode parent, int parentBranch, List< FoundNode > found)
LearningNode newLearningNode ()
LearningNode newLearningNode (double[] initialClassObservations)

Protected Attributes

Node treeRoot
int decisionNodeCount
int activeLeafNodeCount
int inactiveLeafNodeCount
double inactiveLeafByteSizeEstimate
double activeLeafByteSizeEstimate
double byteSizeEstimateOverheadFraction
boolean growthAllowed

Detailed Description

Hoeffding Tree or VFDT.

A Hoeffding tree is an incremental, anytime decision tree induction algorithm that is capable of learning from massive data streams, assuming that the distribution generating examples does not change over time. Hoeffding trees exploit the fact that a small sample can often be enough to choose an optimal splitting attribute. This idea is supported mathematically by the Hoeffding bound, which quantifies the number of observations (in our case, examples) needed to estimate some statistics within a prescribed precision (in our case, the goodness of an attribute).

A theoretically appealing feature of Hoeffding Trees not shared by other incremental decision tree learners is that it has sound guarantees of performance. Using the Hoeffding bound one can show that its output is asymptotically nearly identical to that of a non-incremental learner using infinitely many examples. See for details:

G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In KDD’01, pages 97–106, San Francisco, CA, 2001. ACM Press.

Parameters:

  • -m : Maximum memory consumed by the tree
  • -n : Numeric estimator to use :
    • Gaussian approximation evaluating 10 splitpoints
    • Gaussian approximation evaluating 100 splitpoints
    • Greenwald-Khanna quantile summary with 10 tuples
    • Greenwald-Khanna quantile summary with 100 tuples
    • Greenwald-Khanna quantile summary with 1000 tuples
    • VFML method with 10 bins
    • VFML method with 100 bins
    • VFML method with 1000 bins
    • Exhaustive binary tree
  • -e : How many instances between memory consumption checks
  • -g : The number of instances a leaf should observe between split attempts
  • -s : Split criterion to use. Example : InfoGainSplitCriterion
  • -c : The allowable error in split decision, values closer to 0 will take longer to decide
  • -t : Threshold below which a split will be forced to break ties
  • -b : Only allow binary splits
  • -z : Stop growing as soon as memory limit is hit
  • -r : Disable poor attributes
  • -p : Disable pre-pruning
  • -l : Leaf prediction to use: MajorityClass (MC), Naive Bayes (NB) or NaiveBayes adaptive (NBAdaptive).
  • -q : The number of instances a leaf should observe before permitting Naive Bayes
Author:
Richard Kirkby (rkirkby@cs.waikato.ac.nz)
Version:
Revision:
7

Definition at line 96 of file HoeffdingTree.java.


Member Function Documentation

void moa.classifiers.trees.HoeffdingTree.activateLearningNode ( InactiveLearningNode  toActivate,
SplitNode  parent,
int  parentBranch 
) [protected]

Definition at line 778 of file HoeffdingTree.java.

void moa.classifiers.trees.HoeffdingTree.attemptToSplit ( ActiveLearningNode  node,
SplitNode  parent,
int  parentIndex 
) [protected]

Definition at line 600 of file HoeffdingTree.java.

Referenced by moa.classifiers.trees.ASHoeffdingTree.trainOnInstanceImpl().

Here is the call graph for this function:

Here is the caller graph for this function:

int moa.classifiers.trees.HoeffdingTree.calcByteSize ( )

Definition at line 467 of file HoeffdingTree.java.

Here is the call graph for this function:

static double moa.classifiers.trees.HoeffdingTree.computeHoeffdingBound ( double  range,
double  confidence,
double  n 
) [static]

Definition at line 578 of file HoeffdingTree.java.

void moa.classifiers.trees.HoeffdingTree.deactivateAllLeaves ( )

Definition at line 755 of file HoeffdingTree.java.

void moa.classifiers.trees.HoeffdingTree.deactivateLearningNode ( ActiveLearningNode  toDeactivate,
SplitNode  parent,
int  parentBranch 
) [protected]

Definition at line 766 of file HoeffdingTree.java.

void moa.classifiers.trees.HoeffdingTree.enforceTrackerLimit ( )

Definition at line 678 of file HoeffdingTree.java.

void moa.classifiers.trees.HoeffdingTree.estimateModelByteSizes ( )

Definition at line 725 of file HoeffdingTree.java.

Referenced by moa.classifiers.trees.ASHoeffdingTree.trainOnInstanceImpl().

Here is the call graph for this function:

Here is the caller graph for this function:

FoundNode [] moa.classifiers.trees.HoeffdingTree.findLearningNodes ( ) [protected]

Definition at line 790 of file HoeffdingTree.java.

void moa.classifiers.trees.HoeffdingTree.findLearningNodes ( Node  node,
SplitNode  parent,
int  parentBranch,
List< FoundNode >  found 
) [protected]

Definition at line 796 of file HoeffdingTree.java.

void moa.classifiers.trees.HoeffdingTree.getModelDescription ( StringBuilder  out,
int  indent 
) [virtual]

Returns a string representation of the model.

Parameters:
outthe stringbuilder to add the description
indentthe number of characters to indent

Implements moa.classifiers.AbstractClassifier.

Definition at line 569 of file HoeffdingTree.java.

Measurement [] moa.classifiers.trees.HoeffdingTree.getModelMeasurementsImpl ( ) [protected, virtual]

Gets the current measurements of this classifier.



The reason for ...Impl methods: ease programmer burden by not requiring them to remember calls to super in overridden methods. Note that this will produce compiler errors if not overridden.

Returns:
an array of measurements to be used in evaluation tasks

Implements moa.classifiers.AbstractClassifier.

Definition at line 544 of file HoeffdingTree.java.

String moa.classifiers.trees.HoeffdingTree.getPurposeString ( )

Gets the purpose of this object.

Returns:
the string with the purpose of this object

Reimplemented from moa.classifiers.AbstractClassifier.

Reimplemented in moa.classifiers.trees.ASHoeffdingTree, moa.classifiers.trees.HoeffdingAdaptiveTree, moa.classifiers.trees.LimAttHoeffdingTree, and moa.classifiers.trees.RandomHoeffdingTree.

Definition at line 101 of file HoeffdingTree.java.

double [] moa.classifiers.trees.HoeffdingTree.getVotesForInstance ( Instance  inst)

Predicts the class memberships for a given instance.

If an instance is unclassified, the returned array elements must be all zero.

Parameters:
instthe instance to be classified
Returns:
an array containing the estimated membership probabilities of the test instance in each class

Implements moa.classifiers.Classifier.

Reimplemented in moa.classifiers.trees.HoeffdingAdaptiveTree.

Definition at line 530 of file HoeffdingTree.java.

boolean moa.classifiers.trees.HoeffdingTree.isRandomizable ( )

Gets whether this classifier needs a random seed.

Examples of methods that needs a random seed are bagging and boosting.

Returns:
true if the classifier needs a random seed.

Implements moa.classifiers.Classifier.

Reimplemented in moa.classifiers.trees.LimAttHoeffdingTree, and moa.classifiers.trees.RandomHoeffdingTree.

Definition at line 574 of file HoeffdingTree.java.

int moa.classifiers.trees.HoeffdingTree.measureByteSize ( )

Gets the memory size of this object.

Returns:
the memory size of this object

Reimplemented from moa.AbstractMOAObject.

Definition at line 476 of file HoeffdingTree.java.

int moa.classifiers.trees.HoeffdingTree.measureTreeDepth ( )

Definition at line 561 of file HoeffdingTree.java.

LearningNode moa.classifiers.trees.HoeffdingTree.newLearningNode ( ) [protected]

Definition at line 884 of file HoeffdingTree.java.

Referenced by moa.classifiers.trees.HoeffdingAdaptiveTree.trainOnInstanceImpl(), and moa.classifiers.trees.ASHoeffdingTree.trainOnInstanceImpl().

Here is the caller graph for this function:

LearningNode moa.classifiers.trees.HoeffdingTree.newLearningNode ( double[]  initialClassObservations) [protected]
AttributeClassObserver moa.classifiers.trees.HoeffdingTree.newNominalClassObserver ( ) [protected]

Definition at line 590 of file HoeffdingTree.java.

Here is the call graph for this function:

AttributeClassObserver moa.classifiers.trees.HoeffdingTree.newNumericClassObserver ( ) [protected]

Definition at line 595 of file HoeffdingTree.java.

Here is the call graph for this function:

SplitNode moa.classifiers.trees.HoeffdingTree.newSplitNode ( InstanceConditionalTest  splitTest,
double[]  classObservations 
) [protected]

Reimplemented in moa.classifiers.trees.HoeffdingAdaptiveTree.

Definition at line 585 of file HoeffdingTree.java.

void moa.classifiers.trees.HoeffdingTree.resetLearningImpl ( ) [virtual]

Resets this classifier.

It must be similar to starting a new classifier from scratch.

The reason for ...Impl methods: ease programmer burden by not requiring them to remember calls to super in overridden methods. Note that this will produce compiler errors if not overridden.

Implements moa.classifiers.AbstractClassifier.

Reimplemented in moa.classifiers.trees.ASHoeffdingTree.

Definition at line 481 of file HoeffdingTree.java.

void moa.classifiers.trees.HoeffdingTree.trainOnInstanceImpl ( Instance  inst) [virtual]

Trains this classifier incrementally using the given instance.



The reason for ...Impl methods: ease programmer burden by not requiring them to remember calls to super in overridden methods. Note that this will produce compiler errors if not overridden.

Parameters:
instthe instance to be used for training

Implements moa.classifiers.AbstractClassifier.

Reimplemented in moa.classifiers.trees.ASHoeffdingTree, and moa.classifiers.trees.HoeffdingAdaptiveTree.

Definition at line 496 of file HoeffdingTree.java.


Member Data Documentation

Initial value:
 new FlagOption("binarySplits", 'b',
            "Only allow binary splits.")

Definition at line 153 of file HoeffdingTree.java.

Initial value:
 new IntOption(
            "gracePeriod",
            'g',
            "The number of instances a leaf should observe between split attempts.",
            200, 0, Integer.MAX_VALUE)

Definition at line 133 of file HoeffdingTree.java.

Referenced by moa.classifiers.trees.ASHoeffdingTree.trainOnInstanceImpl().

Initial value:
 new MultiChoiceOption(
            "leafprediction", 'l', "Leaf prediction to use.", new String[]{
                "MC", "NB", "NBAdaptive"}, new String[]{
                "Majority class",
                "Naive Bayes",
                "Naive Bayes Adaptive"}, 2)

Definition at line 812 of file HoeffdingTree.java.

Referenced by moa.classifiers.trees.RandomHoeffdingTree.newLearningNode(), and moa.classifiers.trees.LimAttHoeffdingTree.newLearningNode().

Initial value:
 new IntOption("maxByteSize", 'm',
            "Maximum memory consumed by the tree.", 33554432, 0,
            Integer.MAX_VALUE)

Definition at line 105 of file HoeffdingTree.java.

Initial value:
 new IntOption(
            "memoryEstimatePeriod", 'e',
            "How many instances between memory consumption checks.", 1000000,
            0, Integer.MAX_VALUE)

Definition at line 128 of file HoeffdingTree.java.

Referenced by moa.classifiers.trees.ASHoeffdingTree.trainOnInstanceImpl().

Initial value:
 new IntOption(
            "nbThreshold",
            'q',
            "The number of instances a leaf should observe before permitting Naive Bayes.",
            0, 0, Integer.MAX_VALUE)

Definition at line 819 of file HoeffdingTree.java.

Initial value:
 new ClassOption("nominalEstimator",
            'd', "Nominal estimator to use.", DiscreteAttributeClassObserver.class,
            "NominalAttributeClassObserver")

Definition at line 124 of file HoeffdingTree.java.

Initial value:
 new FlagOption("noPrePrune", 'p',
            "Disable pre-pruning.")

Definition at line 163 of file HoeffdingTree.java.

Initial value:
 new ClassOption("numericEstimator",
            'n', "Numeric estimator to use.", NumericAttributeClassObserver.class,
            "GaussianNumericAttributeClassObserver")

Definition at line 120 of file HoeffdingTree.java.

Initial value:
 new FloatOption(
            "splitConfidence",
            'c',
            "The allowable error in split decision, values closer to 0 will take longer to decide.",
            0.0000001, 0.0, 1.0)

Definition at line 143 of file HoeffdingTree.java.

Initial value:
 new ClassOption("splitCriterion",
            's', "Split criterion to use.", SplitCriterion.class,
            "InfoGainSplitCriterion")

Definition at line 139 of file HoeffdingTree.java.

Initial value:
 new FlagOption(
            "stopMemManagement", 'z',
            "Stop growing as soon as memory limit is hit.")

Definition at line 156 of file HoeffdingTree.java.

Initial value:
 new FloatOption("tieThreshold",
            't', "Threshold below which a split will be forced to break ties.",
            0.05, 0.0, 1.0)

Definition at line 149 of file HoeffdingTree.java.


The documentation for this class was generated from the following file:
 All Classes Namespaces Files Functions Variables Enumerations