milk.core
Class Exemplars

java.lang.Object
  |
  +--milk.core.Exemplars
All Implemented Interfaces:
java.io.Serializable

public class Exemplars
extends java.lang.Object
implements java.io.Serializable

The class of a set of exemplars

See Also:
Serialized Form

Constructor Summary
Exemplars(Exemplars exemplars)
          Constructor to form an Exemplars by deep copying from another Exemplars
Exemplars(Exemplars exemplars, int size)
          Constructor creating an empty Exemplars with the same structure of the given Exemplars and the given size (i.e.
Exemplars(Exemplars source, int first, int toCopy)
          Creates a new set of instances by copying a subset of another set.
Exemplars(weka.core.Instances dataset)
          Constructor using the given dataset and set ID index to 0
Exemplars(weka.core.Instances dataset, int idIndex)
          Constructor using the given dataset and set ID index to the given ID index.
 
Method Summary
 void add(Exemplar exemplar)
          Adds one exemplar to the exemplars
 void add(weka.core.Instance instance)
          Adds one instance to one of the exemplars
 weka.core.Attribute attribute(int index)
          Returns an attribute.
 weka.core.Attribute attribute(java.lang.String name)
          Returns an attribute given its name.
 boolean checkForStringAttributes()
          Checks for string attributes in the Exemplars
 weka.core.Attribute classAttribute()
          Returns the class attribute.
 int classIndex()
          Returns the class attribute's index.
 void compactify()
          Compactifies each exemplar in this Exemplars
 void delete()
          Removes all Exemplars from the set.
 void delete(int index)
          Removes an exemplar at the given position from the set.
 void deleteAttributeAt(int position)
          Deletes an attribute at the given position (0 to numAttributes() - 1).
 void deleteStringAttributes()
          Deletes all string attributes in the dataset.
 void deleteWithMissing(weka.core.Attribute att)
          Removes all instances with missing values for a particular attribute from the dataset.
 void deleteWithMissing(int attIndex)
          Removes all instances with missing values for a particular attribute from the dataset.
 java.util.Enumeration enumerateAttributes()
          Returns an enumeration of all the attributes.
 Exemplar exemplar(int index)
          Returns the exemplar at the given position.
 Exemplar firstExemplar()
          Returns the first exemplar in the set.
 java.util.Vector getExemplars()
          Returns a vector of exemplars in this Exemplars.
 weka.core.Attribute idAttribute()
          Returns the ID attribute.
 int idIndex()
          Returns the ID attribute's index.
 void insertAttributeAt(weka.core.Attribute att, int position)
          Inserts an attribute at the given position (0 to numAttributes()) and sets all values to be missing.
 Exemplar lastExemplar()
          Returns the last exemplar in the set.
static void main(java.lang.String[] args)
          Main method for this class -- just performone run of 10-fold CV and prints out the set.
 int numAttributes()
          Returns the number of attributes.
 int numClasses()
          Returns the number of class labels.
 int numExemplars()
          Returns the number of exemplars in the set.
 int[] numsInstances()
          Returns the number of instances in the dataset.
 void randomize(java.util.Random random)
          Shuffles the exemplars in the set so that they are ordered randomly.
 java.lang.String relationName()
          Returns the relation's name.
 void renameAttribute(weka.core.Attribute att, java.lang.String name)
          Renames an attribute.
 void renameAttribute(int att, java.lang.String name)
          Renames an attribute.
 void renameAttributeValue(weka.core.Attribute att, java.lang.String val, java.lang.String name)
          Renames the value of a nominal (or string) attribute value.
 void renameAttributeValue(int att, int val, java.lang.String name)
          Renames the value of a nominal (or string) attribute value.
 Exemplars resample(java.util.Random random)
          Creates a new Exemplars of the same size using random sampling with replacement.
 Exemplars resampleWithWeights(java.util.Random random)
          Creates a new Exemplars of the same size using random sampling with replacement according to the current exemplar weights.
 Exemplars resampleWithWeights(java.util.Random random, double[] weights)
          Creates a new dataset of the same size using random sampling with replacement according to the given weight vector.
 void setRelationName(java.lang.String newName)
          Sets the relation's name.
 void sort()
          Sorts the instances based on the ID attribute.
 void stratify(int numFolds)
          Stratifies a set of exemplars according to its class values if the class attribute is nominal (so that afterwards a stratified cross-validation can be performed).
 double[] sumsOfWeights()
          Computes the sum of all the exemplars' weights.
 Exemplars testCV(int numFolds, int numFold)
          Creates the test set for one fold of a cross-validation on the dataset.
 java.lang.String toString()
          Returns the exemplars as a string.
 Exemplars trainCV(int numFolds, int numFold)
          Creates the training set skipping for one fold of a cross-validation on the exemplar set.
 Exemplars trainCV(int numFolds, int numFold, java.util.Random random)
          Creates the training set for one fold of a cross-validation on the dataset.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Exemplars

public Exemplars(Exemplars exemplars)
Constructor to form an Exemplars by deep copying from another Exemplars

Parameters:
exemplars - the copied Exemplars

Exemplars

public Exemplars(Exemplars exemplars,
                 int size)
Constructor creating an empty Exemplars with the same structure of the given Exemplars and the given size (i.e. the number of exemplars in the set)

Parameters:
exemplars - the given Exemplars
size - the given size

Exemplars

public Exemplars(weka.core.Instances dataset)
          throws java.lang.Exception
Constructor using the given dataset and set ID index to 0

Parameters:
dataset - the set to be copied
Throws:
java.lang.Exception - if the class index of the dataset is not set(i.e. -1)

Exemplars

public Exemplars(Exemplars source,
                 int first,
                 int toCopy)
Creates a new set of instances by copying a subset of another set.

Parameters:
source - the set of instances from which a subset is to be created
first - the index of the first instance to be copied
toCopy - the number of instances to be copied
Throws:
java.lang.IllegalArgumentException - if first and toCopy are out of range

Exemplars

public Exemplars(weka.core.Instances dataset,
                 int idIndex)
          throws java.lang.Exception
Constructor using the given dataset and set ID index to the given ID index. Any instances with class value or ID value missing will be dropped.

Parameters:
dataset - the instances from which the header information is to be taken
idIndex - the ID attribute's index
Throws:
java.lang.Exception - if the class index of the dataset is not set(i.e. -1) or the data is not a multi-instance data
Method Detail

add

public final void add(weka.core.Instance instance)
Adds one instance to one of the exemplars

Parameters:
instance - the instance to be added
Throws:
java.lang.Exception - if the instance cannot be added properly

add

public final void add(Exemplar exemplar)
Adds one exemplar to the exemplars

Parameters:
exemplar - the exemplar to be added
Throws:
java.lang.Exception - if the exemplar already exists

attribute

public final weka.core.Attribute attribute(int index)
Returns an attribute.

Parameters:
index - the attribute's index
Returns:
the attribute at the given position

attribute

public final weka.core.Attribute attribute(java.lang.String name)
Returns an attribute given its name. If there is more than one attribute with the same name, it returns the first one. Returns null if the attribute can't be found.

Parameters:
name - the attribute's name
Returns:
the attribute with the given name, null if the attribute can't be found

checkForStringAttributes

public boolean checkForStringAttributes()
Checks for string attributes in the Exemplars

Returns:
true if string attributes are present, false otherwise

classAttribute

public final weka.core.Attribute classAttribute()
Returns the class attribute.

Returns:
the class attribute
Throws:
weka.core.UnassignedClassException - if the class is not set

classIndex

public final int classIndex()
Returns the class attribute's index. Returns negative number if it's undefined.

Returns:
the class index as an integer

compactify

public final void compactify()
Compactifies each exemplar in this Exemplars


delete

public final void delete()
Removes all Exemplars from the set.


delete

public final void delete(int index)
Removes an exemplar at the given position from the set.

Parameters:
index - the instance's position

deleteAttributeAt

public void deleteAttributeAt(int position)
                       throws java.lang.Exception
Deletes an attribute at the given position (0 to numAttributes() - 1).

Throws:
java.lang.Exception - if the given index is out of range or the class attribute is being deleted

deleteStringAttributes

public void deleteStringAttributes()
                            throws java.lang.Exception
Deletes all string attributes in the dataset. A deep copy of the attribute information is performed before an attribute is deleted.

Throws:
java.lang.IllegalArgumentException - if string attribute couldn't be successfully deleted (probably because it is the class attribute).
java.lang.Exception

deleteWithMissing

public final void deleteWithMissing(int attIndex)
Removes all instances with missing values for a particular attribute from the dataset.

Parameters:
attIndex - the attribute's index

deleteWithMissing

public final void deleteWithMissing(weka.core.Attribute att)
Removes all instances with missing values for a particular attribute from the dataset.

Parameters:
att - the attribute

enumerateAttributes

public java.util.Enumeration enumerateAttributes()
Returns an enumeration of all the attributes.

Returns:
enumeration of all the attributes.

getExemplars

public final java.util.Vector getExemplars()
Returns a vector of exemplars in this Exemplars.

Returns:
a vector of all exemplars

firstExemplar

public final Exemplar firstExemplar()
Returns the first exemplar in the set.

Returns:
the first exemplar in the set

idAttribute

public final weka.core.Attribute idAttribute()
Returns the ID attribute.

Returns:
the ID attribute

idIndex

public final int idIndex()
Returns the ID attribute's index.

Returns:
the ID index as an integer

insertAttributeAt

public void insertAttributeAt(weka.core.Attribute att,
                              int position)
Inserts an attribute at the given position (0 to numAttributes()) and sets all values to be missing.

Parameters:
att - the attribute to be inserted
Throws:
java.lang.IllegalArgumentException - if the given index is out of range

exemplar

public final Exemplar exemplar(int index)
Returns the exemplar at the given position.

Parameters:
index - the exemplar's index
Returns:
the exemplar at the given position

lastExemplar

public final Exemplar lastExemplar()
Returns the last exemplar in the set.

Returns:
the last exemplar in the set

numAttributes

public final int numAttributes()
Returns the number of attributes.

Returns:
the number of attributes as an integer

numClasses

public final int numClasses()
Returns the number of class labels.

Returns:
the number of class labels as an integer if the class attribute is nominal, 1 otherwise.
Throws:
weka.core.UnassignedClassException - if the class is not set

numExemplars

public final int numExemplars()
Returns the number of exemplars in the set.

Returns:
the number of distinct values of exemplar

numsInstances

public final int[] numsInstances()
Returns the number of instances in the dataset.

Returns:
the number of instances in the dataset as an integer array

randomize

public final void randomize(java.util.Random random)
Shuffles the exemplars in the set so that they are ordered randomly.

Parameters:
random - a random number generator

relationName

public final java.lang.String relationName()
Returns the relation's name.

Returns:
the relation's name as a string

renameAttribute

public final void renameAttribute(int att,
                                  java.lang.String name)
Renames an attribute.

Parameters:
att - the attribute's index
name - the new name

renameAttribute

public final void renameAttribute(weka.core.Attribute att,
                                  java.lang.String name)
Renames an attribute.

Parameters:
att - the attribute
name - the new name

renameAttributeValue

public final void renameAttributeValue(int att,
                                       int val,
                                       java.lang.String name)
Renames the value of a nominal (or string) attribute value.

Parameters:
att - the attribute's index
val - the value's index
name - the new name

renameAttributeValue

public final void renameAttributeValue(weka.core.Attribute att,
                                       java.lang.String val,
                                       java.lang.String name)
Renames the value of a nominal (or string) attribute value.

Parameters:
att - the attribute
val - the value
name - the new name

resample

public final Exemplars resample(java.util.Random random)
Creates a new Exemplars of the same size using random sampling with replacement.

Parameters:
random - a random number generator
Returns:
the new Exemplars

resampleWithWeights

public final Exemplars resampleWithWeights(java.util.Random random)
                                    throws java.lang.Exception
Creates a new Exemplars of the same size using random sampling with replacement according to the current exemplar weights. The weights of the exemplars in the new set are set to one.

Parameters:
random - a random number generator
Returns:
the new dataset
Throws:
java.lang.Exception - if the weights array is of the wrong length or contains negative weights or any other errors related to exemplars.

resampleWithWeights

public final Exemplars resampleWithWeights(java.util.Random random,
                                           double[] weights)
                                    throws java.lang.Exception
Creates a new dataset of the same size using random sampling with replacement according to the given weight vector. The weights of the exemplars in the new dataset are set to one. The length of the weight vector has to be the same as the number of exemplars in the dataset, and all weights have to be positive.

Parameters:
random - a random number generator
weights - the weight vector
Returns:
the new dataset
Throws:
java.lang.Exception - if the weights array is of the wrong length or contains negative weights or any other errors related to exemplars.

setRelationName

public final void setRelationName(java.lang.String newName)
Sets the relation's name.

Parameters:
newName - the new relation name.

sort

public final void sort()
Sorts the instances based on the ID attribute. For numeric attributes, instances are sorted in ascending order. For nominal attributes, instances are sorted based on the attribute label ordering specified in the header. The instances inside an exemplar are not sorted.


stratify

public final void stratify(int numFolds)
Stratifies a set of exemplars according to its class values if the class attribute is nominal (so that afterwards a stratified cross-validation can be performed).

Parameters:
numFolds - the number of folds in the cross-validation
Throws:
weka.core.UnassignedClassException - if the class is not set

sumsOfWeights

public final double[] sumsOfWeights()
Computes the sum of all the exemplars' weights.

Returns:
the sum of all the exemplars' weights as a double

testCV

public Exemplars testCV(int numFolds,
                        int numFold)
                 throws java.lang.Exception
Creates the test set for one fold of a cross-validation on the dataset.

Parameters:
numFolds - the number of folds in the cross-validation. Must be greater than 1.
numFold - 0 for the first fold, 1 for the second, ...
Returns:
the test set as a set of weighted instances
Throws:
java.lang.Exception - if the number of folds is less than 2 or greater than the number of exemplars or any other errors related to exemplar occur

toString

public final java.lang.String toString()
Returns the exemplars as a string. It only shows each exemplar's ID value, class value and weight as well as the ARFF header of the dataset

Overrides:
toString in class java.lang.Object
Returns:
the set of exemplars as a string

trainCV

public Exemplars trainCV(int numFolds,
                         int numFold)
                  throws java.lang.Exception
Creates the training set skipping for one fold of a cross-validation on the exemplar set.

Parameters:
numFolds - the number of folds in the cross-validation. Must be greater than 1.
numFold - 0 for the first fold, 1 for the second, ...
Returns:
the training set as a set of weighted instances
Throws:
java.lang.Exception - if the number of folds is less than 2 or greater than the number of exemplars or or any other errors related to exemplar occur.

trainCV

public Exemplars trainCV(int numFolds,
                         int numFold,
                         java.util.Random random)
                  throws java.lang.Exception
Creates the training set for one fold of a cross-validation on the dataset. The data is subsequently randomized based on the given random number generator.

Parameters:
numFolds - the number of folds in the cross-validation. Must be greater than 1.
numFold - 0 for the first fold, 1 for the second, ...
random - the random number generator
Returns:
the training set
Throws:
java.lang.IllegalArgumentException - if the number of folds is less than 2 or greater than the number of instances.
java.lang.Exception

main

public static void main(java.lang.String[] args)
Main method for this class -- just performone run of 10-fold CV and prints out the set. Assume ID is the first attribute and class is the last one.