COMP416A/516A 2006: Topics in Artificial Intelligence
- Assignment 3: Automatic Data Cleansing -
UPDATED Section 4 (01/05/06)
UPDATED submission instructions (10/05/06) (see bottom of this page)
- Write a meta classifier that takes another classifier (i.e. a
"base" classifier) as an argument and implements the following
- Build a base classifier on the training data.
- Classify the training data using the base classifier.
- Collect the training instances that have been classified correctly and make this the new training set.
- Go to Step 1 if the training set has changed in Step 3.
Your new meta classifier should extend the class
- Run your new meta classifier with the decision tree learner
weka.classifiers.trees.J48 as the base learner on the
soybean datasets in
/home/ml/datasets/UCI. Compare the size of the final tree
generated by the meta classifier and the tree generated by plain
- Perform a more extensive experiment to evaluate the effect of the
meta classifier on accuracy. Use Weka's
run a 10 times 10-fold cross-validation on all datasets in
/home/ml/datasets/UCI. In your first experiment, compare
the accuracy of plain
J48 to the accuracy of the meta
classifier applied in conjunction with
J48. Also record
how much data (in percent) is discarded by the meta classifier. (You
can record that information in the
Experimenter by making
your meta classifier implement the
weka.core.AdditionalMeasureProducer interface and
implementing appropriate methods. The
J48 class is an
example of a class that implements this interface.)
- Repeat the same experiment with
weka.classifiers.functions.SMO. Make sure to turn
on the "-M" option of SMO.Because SMO is not very efficient for
large datasets, use only the list of datasets of Section 2
soybean) for any SMO-based experiments. But do use the
same procedure as in Section 3 (10times 10fold CV, etc.)
- Change your meta classier so that it only discards an instance if
it is misclassified and the classifier is very confident in its
prediction (i.e. if the classifier gets it "badly wrong"). To this
end, introduce a new parameter
X to your classifier and
discard an instance if it is misclassified and the base classifier's
probability for the predicted class is greater than
the cutoff value
- Re-run all the above experiments with this new version of the
meta classifier, setting the cutoff value
X to 0.9.
- Write a report that records how your investigations proceeded and
the results you obtained. Make sure you comment on the results and
structure your report appropriately. Also comment on the method's
ability to identify actual errors in the data.
Value: 25% of the total marks for all four assignments
Due date: Friday, 12 May, 5:00 PM
No extensions will be granted except for sound, documented, medical reasons.
Complete your assignment early: computers tend to go wrong at the last
WHAT and HOW to submit: email me (firstname.lastname@example.org) the Java code for the version of
your meta-classifier that was needed in Section 5. The report
can be handed in on paper or electronically. Acceptable formats
for electronical report submission in order of preference are:
pdf, ps, txt, (OpenOffice or Word if you have to).