COMP423/523A, Project and Presentation specification
Possible Projects:
  1. Parameterless Outlier Detection in Data Streams
  2. Keith [On-line Random Forests]
  3. Boris [Online Mining of Temporal Maximal Utility Itemsets from Data Streams]
  4. SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows
  5. Ben [OcVFDT: One-class Very Fast Decision Tree for One-class Classification of Data Streams]
  6. Michael [Addressing Concept-Evolution in Concept-Drifting Data Streams]
  7. Learning Recurring Concepts from Data Streams with a Context-aware Ensemble
  8. Keisuke [Classification Model for Data Streams Based on Similarity]
  9. A Fast Approximation Strategy for Summarizing a Set of Streaming Time Series
  10. Yang [One-Class Learning and Concept Summarization for Vaguely Labeled Data Streams]
  11. Random Rules from Data Streams
  12. Discovering Global and Local Bursts in a Stream of News
  13. Smoothing Techniques for Adaptive Online Language Models: Topic Tracking in Tweet Streams
  14. a partial port of Vowpal Wabbit to Moa

For your chosen paper you will have to implement the algorithm as described in this paper. You have to use Java and you have to use the MOA framework. If the algorithm depends on infra-structure currently missing from MOA, then I might give you permission to do a stand-alone version outside MOA. [but in this case you must in your report explain what features MOA lacks].

You will have to submit source code; plus any information necessary for compiling, installing, and running your code; plus a step-by-step guide to one example run of your algorithm together with the expected output for this example run; a brief 4 page report explaining what is interesting about your implementation, e.g. smart data structures used, unclear bits of the original paper or even errors, and your interpretation or resolution; the report should also contain results of some sample runs.

Marks:
15 source code
10 report
5 example run step-by-step guide; install instructions
10 presentation

Presentations: Friday June 14th, 9.30-11

Hand-in date: Monday June 17th, 23.55pm

Hand-in items are:

  1. the (short) report (as pdf)
  2. your presentation slides (as pdf)
  3. source code, best packaged up as a tar or a jar file
  4. install instructions plus example run step-by-step guide (pdf or plain txt)

On Monday *before* the presentation you will have to send me your draft slides, so that I can give you feedback before the actual presentation.

Generally, if anything is unclear, just ask me.

Also, papers are often incomplete, omitting details, and making exact reconstruction impossible. You can email the authors asking for clarifications, but they might be busy and not answer in time. Then you (or I) will have to make some reasonable decisions.