COMP423/52315B, Project and Presentation specification
  1. Chris [a (partial) port of Vowpal Wabbit to Moa]
  2. Rory [Online Mining of Temporal Maximal Utility Itemsets from Data Streams]
  3. Severin & Christian [On-line Random Forests]
  4. Jackson & Divya [OcVFDT: One-class Very Fast Decision Tree for One-class Classification of Data Streams]
  5. Patrick & Nathan [Smoothing Techniques for Adaptive Online Language Models: Topic Tracking in Tweet Streams]
  6. Owen [A Fast Approximation Strategy for Summarizing a Set of Streaming Time Series]
  7. Luke & Tom [Random Rules from Data Streams]
  8. Rewi [Classification Model for Data Streams Based on Similarity]
  9. David [Parameterless Outlier Detection in Data Streams]
  10. Matt [SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows]

Available Projects:

  1. Addressing Concept-Evolution in Concept-Drifting Data Streams
  2. Learning Recurring Concepts from Data Streams with a Context-aware Ensemble
  3. One-Class Learning and Concept Summarization for Vaguely Labeled Data Streams
  4. Discovering Global and Local Bursts in a Stream of News
  5. your very own project, but you need to get my upfront approval

You will choose one of the papers listed above. Alternatively you can suggest another data stream paper to me. I will check to see if I think your suggestion is feasible, and will most likely approve it. Either way, please email me your selection so that I can update this page: first come, first serve. You can work on this project either by yourself, or as a team of two. Obviously, I will expect a more substantial report from teams of two, than from a single student. [If you work as a team, please submit your work twice, once for each team member, to ensure proper marking will happen.]

For your chosen paper you will have to implement the algorithm as described in this paper. You have to use Java and you have to use the MOA framework. If the algorithm depends on infra-structure currently missing from MOA, then I might give you permission to do a stand-alone version outside MOA [but in this case you must also in your report explain what features MOA is currently lacking].

You will have to submit source code; plus any information necessary for compiling, installing, and running your code; plus a step-by-step guide to one example run of your algorithm together with the expected output for this example run; a brief 4 page (8 pages for teams of two) report explaining what is interesting about your implementation, e.g. smart data structures used, unclear bits of the original paper or even errors, and your interpretation or resolution; the report should also contain results of some sample runs.

10 source code
5 report
5 example run step-by-step guide; install instructions
10 presentation (5-10 minutes: the algorithm, your implementation, results, issues, summary)

Presentations: Friday Oct 23rd, 10am-...

Hand-in date: Friday Oct 23rd, 23.30pm

Hand-in items are:

  1. the (short) report (as pdf)
  2. your presentation slides (as pdf)
  3. source code, best packaged up as a tar or a jar file
  4. install instructions plus example run step-by-step guide (pdf or plain txt)

Generally, if anything is unclear, just ask me. If you want feedback on your draft slides, we can arrange for a meeting on Friday Oct 16th.

Also, papers are often incomplete, omitting details, and making exact reconstruction impossible. You can email the authors asking for clarifications, but they might be busy and not answer in time. Then you (or I) will have to make some reasonable decisions.