COMP423/523-16B, Project and Presentation specification
  1. Thomas [Learning Recurring Concepts from Data Streams with a Context-aware Ensemble]
  2. Guangru + Gulnar [Discovering Global and Local Bursts in a Stream of News]
  3. Tianyang Liu [Smoothing Techniques for Adaptive Online Language Models: Topic Tracking in Tweet Streams]
  4. Jeff [Addressing Concept-Evolution in Concept-Drifting Data Streams]
  5. Joshua [OcVFDT: One-class Very Fast Decision Tree for One-class Classification of Data Streams]
  6. Frankie | JP + Steven [Random Rules from Data Streams]
  7. Andre [Fast Hoeffding Drift Detection Method for Evolving Data Streams]
Available Projects:
  1. Online Mining of Temporal Maximal Utility Itemsets from Data Streams
  2. A Fast Approximation Strategy for Summarizing a Set of Streaming Time Series
  3. Classification Model for Data Streams Based on Similarity
  4. Parameterless Outlier Detection in Data Streams
  5. One-Class Learning and Concept Summarization for Vaguely Labeled Data Streams
  6. Irrevocable-choice algorithms for sampling from a stream
  7. your very own project, but you need to get my approval before you start

You will choose one of the papers listed above. Alternatively you can suggest another data stream paper to me. I will check to see if I think your suggestion is feasible, and will most likely approve it. Either way, please email me your selection so that I can update this page: first come, first serve. You can work on this project either by yourself, or as a team of two. Obviously, I will expect a more substantial report from teams of two, than from a single student. [If you work as a team, please submit your work twice, once for each team member, to ensure proper marking will happen.]

For your chosen paper you will have to implement the algorithm as described in this paper. You have to use Java and you have to use the MOA framework. If the algorithm depends on infra-structure currently missing from MOA, then I might give you permission to do a stand-alone version outside MOA [but in this case you must also in your report explain what features MOA is currently lacking].

You will have to submit source code; plus any information necessary for compiling, installing, and running your code; plus a step-by-step guide to one example run of your algorithm together with the expected output for this example run; a brief 4 page (8 pages for teams of two) report explaining what is interesting about your implementation, e.g. smart data structures used, unclear bits of the original paper or even errors, and your interpretation or resolution; the report should also contain results of some sample runs.

15 source code
10 report, including example run step-by-step guide; install instructions
5 presentation (5-10 minutes: the algorithm, your implementation, results, issues, summary)

Presentations: Wed Oct 19th, 10am-...

Hand-in date: Fri Oct 21th, 23.30pm

Hand-in items are:

  1. the (short) report (as pdf)
  2. your presentation slides (as pdf)
  3. source code, best packaged up as a tar or a jar file
  4. install instructions plus example run step-by-step guide (pdf or plain txt)

Generally, if anything is unclear, just ask me. If you want, I can give you feedback on your draft slides.

Also, papers are often incomplete, omitting details, and making exact reconstruction impossible. You can email the authors asking for clarifications, but they might be busy and not answer in time. Then you (or I) will have to make some reasonable decisions.