COMP423/523A, Project and Presentation specification
Assigned Projects:
  1. Henry [Smoothing Techniques for Adaptive Online Language Models: Topic Tracking in Tweet Streams]
  2. Keith [On-line Random Forests]
  3. Rosjier [OcVFDT: One-class Very Fast Decision Tree for One-class Classification of Data Streams]
  4. Duncan [Random Rules from Data Streams]
  5. Nikhil [Online Mining of Temporal Maximal Utility Itemsets from Data Streams]
  6. Vignesh [SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows]
  7. Ran [Classification Model for Data Streams Based on Similarity]
  8. Jonathan [Parameterless Outlier Detection in Data Streams]
Available Projects:
  1. Addressing Concept-Evolution in Concept-Drifting Data Streams
  2. Learning Recurring Concepts from Data Streams with a Context-aware Ensemble
  3. A Fast Approximation Strategy for Summarizing a Set of Streaming Time Series
  4. One-Class Learning and Concept Summarization for Vaguely Labeled Data Streams
  5. Discovering Global and Local Bursts in a Stream of News
  6. a (partial) port of Vowpal Wabbit to Moa
  7. your very own project, but you need to get my upfront approval

For your chosen paper you will have to implement the algorithm as described in this paper. You have to use Java and you have to use the MOA framework. If the algorithm depends on infra-structure currently missing from MOA, then I might give you permission to do a stand-alone version outside MOA [but in this case you must in your report explain what features MOA is currently lacking].

You will have to submit source code; plus any information necessary for compiling, installing, and running your code; plus a step-by-step guide to one example run of your algorithm together with the expected output for this example run; a brief 4 page report explaining what is interesting about your implementation, e.g. smart data structures used, unclear bits of the original paper or even errors, and your interpretation or resolution; the report should also contain results of some sample runs.

Marks:
10 source code
5 report
5 example run step-by-step guide; install instructions
10 presentation (5-10 minutes: the algorithm, your implementation, results, issues, summary)

Presentations: Friday June 20th, 10am-noon

Hand-in date: Monday June 23th, 23.55pm

Hand-in items are:

  1. the (short) report (as pdf)
  2. your presentation slides (as pdf)
  3. source code, best packaged up as a tar or a jar file
  4. install instructions plus example run step-by-step guide (pdf or plain txt)

On the Tuesday *before* the presentation (June 17th) you will have to send me your draft slides, so that I can give you feedback before the actual presentation.

Generally, if anything is unclear, just ask me.

Also, papers are often incomplete, omitting details, and making exact reconstruction impossible. You can email the authors asking for clarifications, but they might be busy and not answer in time. Then you (or I) will have to make some reasonable decisions.