Waikato University crest

Department of
Computer Science
Tari Rorohiko

Computing and Mathematical Sciences

COMP477, COMP520, COMP591, ENGG482 and ENGG492
Projects offered in 2013

COMP477 | COMP520, ENGG482, ENGG492 | COMP591


COMP477 Projects

Please print out and complete the COMP477 BCMS Project Selection Form PDF to select your COMP477 project for 2013.


JUDY BOWEN

Redeveloping a modelling tool in Scala using test-driven development
A modeling tool called PIMed has been developed in Java along with a set of unit tests used for regression testing. The aim of this project is to redevelop the test  suite in ScalaTest as the first step to converting PIMed from Java to Scala.

Developing the front-end of a combined modelling tool using Scala
The modelling safety-critical interactive systems project here at Waikato relies on a number of different tools which have been developed over the years, primarily in Haskell or Java. We are currently in the process of migrating these tools to Scala with the aim of combining them into a single modelling tool. This project is to design and develop the user-interface which will enable a user to interact with these various tools once combined. You will need knowledge of programming in Java (Haskell would be an advantage but is not necessary), an interest in learning to programme in Scala and the ability to use user-centred design techniques to develop the new interface.

 

ANNIKA HINZE

Text categorization and analysis based on document history
Documents are created daily by people, but they are not typically organized and stored in a sensible manner. There are duplicates with the same or a different name in various locations and slightly modified versions created for different audiences. There are folders that clearly subdivided by either document type or task and there are folders that are simply large bins of everything. A document’s history involves its creation, modification and ownership history, as well as whether or not it was included as an attachment to an email and the receivers of that email. The date stamp on the documents, the emails and wiki pages created during the same time are what constitutes and influences the document history timeline. This information is important for understanding the document’s purpose, its significance and relation to other documents. This research project will focus on investigating ways of deriving this information from documents and email archives and use it for detecting duplicates, as well as automatic categorization of documents into groups by topic and significance.

This project will be done in collaboration with Pingar (Auckland Software Company). They will provide an email archive, and the documents associated with it. Pingar can also provide text analytics on the document text, which can be used in combination with approaches based purely on document history.

1000 springs
This project is a collaboration with a company in New Zealand. We aim to classify 1000 springs into a large database (which has already been created). This project has several components, which each may form a separate thesis. We look into

Funding and scholarships are available for theses related to this project.

 

STEVE JONES

Interactive DOM visualization
The Document Object Model is a language- and platform-independent representation of an HTML document. A DOM representation of a web page is constructed by a web browser when the page is loaded, and can then be manipulated in a programming language such as JavaScript. It is often useful for a developer to consider the DOM structure created from the HTML that they wrote, yet there is limited supported for them to do so in current tools. This project will focus on the design and implementation of a web-browser add-on that provides an interactive visualization of a page's DOM, its relationship to the underlying HTML and allows the user to manipulate the DOM to effect changes in the web page. The project requires a solid background in JavaScript, HTML and CSS.

 

DORIS JUNG

Fine-tuning alerting systems filter
Alerting systems inform the user of information relevant to their information needs. Unlike databases they work with transient data, constantly entering the alerting system to be filtered against users’ information needs that have been registered with the system. The filtering algorithm is controlled by several parameters. These can be hard-coded or set by a user.

This project takes an existing paper prototype for setting these parameters by a user and realizes its implementation. For this implementation the student will take into account results that have been found by the supervisor and Annika Hinze in an evaluation. Ideally it integrates into an existing wider alerting system solution implemented in C#. Other technical solutions may, however, be negotiated.

Digitally prototyping a controller for multiple medical infusion pumps
The Formal Methods group - in collaboration with Waikato Hospital - has modelled a controller for medical infusion pumps. Some of the aims of this controller are to reduce patients’ fear by eliminating beeping of pumps and to enable nurses to monitor all pumps simultaneously. Paper prototypes have been developed along with corresponding formal models of both functional behaviour and the prototypes.

This project will take these existing paper prototypes and realize their implementation. The aim is to create a testable digital prototype. The project will be realized for an iPad or similar device.

 

TE TAKA KEEGAN

Improving FaceBook in Māori?
Recently a 'skin' for FaceBook was provided in Māori language (see: http://news.tangatawhenua.com/archives/19182). The skin uses GreaseMonkey to switch interface strings into Māori. The approach has some flaws in that not all strings are translated into Māori and the translation is not available for small screen devices. This project seeks to overcome those flaws but investigating how a more complete interface can be offered and how a similar method could be used for small screen devices.

 

RYAN KO

Kadinsky Graph Engine – Identifcation of real-life events from large linked-data sets
Kandinsky Graphs were first proposed in 2011 as a way to visualize and infer important real-life events from linked-data sets such as web server logs. When users access the Web, they leave a trail as Web servers typically maintain a history of requests. Web usage mining approaches, which try to determine what users are interested in, have been studied since the beginning of the Web given the log's huge potential for purposes such as resource annotation, personalization, forecasting etc. However, the impact of any such efforts has not really gone beyond generating statistics detailing who, when, and how Web pages maintained by a Web server were visited. This project focuses on building the front-end and engine for the analysis of linked-data sets for possible detection and identification of real-life events. The project also improves on the limitations of the current Kadinsky Graphs. The student will be redeveloping the prototype, and required to code in Python and deploy platform-independent visualisation toolkits. This project will be co-supervised with a researcher from Visa Research. Interested students may contact the supervisor at ryanko@acm.org. (For more info: URL)

 

RICHARD NELSON

Idle Port Monitor
Idle ports on Ethernet networks receive broadcast traffic such as ARP. Monitoring the rate and source of such traffic can help monitor the health of the Ethernet segment.   This project is to build a monitor for idle ports on ethernet segments that measures the broadcast traffic by protocol and can report numbers via SNMP.

SCADA-protocols
Control systems (SCADA) are moving from proprietry industrial protocols to being based on IP.  This project would be to investigate the major SCADA protocols that are IP based and to add decode support for them to libtrace.   Working with industrial partners would would use the decode to develop simple tools to profile the traffic on a real SCADA network.

High speed capture using commodity NICs
When capturing traffic for measurement purposes it is desirable to capture as many of the packets as possible, preferably all.  It is possible to do this using special high performance hardware, however this is very expensive.  It is possible to improve the capture performance of commodity network interface cards using a specialised driver and interface software.   This project is to investigate the PACKET_MMAP interface provided by Linux.  To investigate how it works, measure its performance compared to the standard socket interface and integrate it into libtrace.

 

DAVE NICHOLS

Automating security testing for Greenstone
Web applications, such as the Greenstone digital library software, are complex pieces of software that may have hidden security weaknesses. Manually testing for these weaknesses is time-consuming and prone to error. This project will take existing automated security testing software (such as W3AF and Google's Skipfish) and create a practical security test system for Greenstone. The project is likely to involve a variety of programming languages, scripting and Web technologies. Extending these existing systems to add new security tests is also an option.

 

STEVE REEVES

Further development of the interface for a reeingineered modelling tool
A modelling tool called AmuZed exists for creating, editing and exporting (as graphics) microchart models. The tool is currently written in Haskell and uses an outdated graphics library which is no longer supported. A project last year started the task of reengineering AMuZed into Scala. The aim of this project is to complete the tool. You'll learn Scala, GUI programming and practice developing a large program that gets used "for real" by people building models of systems.

 

SIMON SPACEY

Characterising Java programs with 3S
3S is a program characterisation framework that works at the assembly level to analyse the internal workings of programs. By working at the assembly level 3S should be able to work with any compilable language. However, there has been very little work using 3S to analyse natively compiled Java programs.

The aim of this project is to get 3S instrumenting Java programs compiled to native code using gcc on Linux. In doing this you will strengthen (and have evidence of) your skills in: the 3S characterisation framework, Unix tools, x86 assembly, gcc and gas compiler options, Python, C/C++ and the innermost workings of natively optimised Java programs.

The project report should provide details of any changes required to the 3S framework and whether these changes can be made in a Universal Patch to keep a single 3S code base. Additionally, you should spend some time analysing Java kernels and real Java programs with 3S and use 3S to prove that gcc compiled Java is really optimised and not just, say, byte codes packaged with an interpreter in a ELF file. Additionally, if time permits, you may consider the process of using compiled JARs with nativly compiled Java and comment on options from a characterisation point of view.

 

MARK UTTING

Monkey testing for JStar
JStar is a declarative Java-based parallel programming language that is being developed at Waikato University.

Each JStar rule has a clearly defined set of input tuples and output tuples, which means that we can invent new ways of writing unit tests for rules. This project will explore ways of expressing sets of input tuples, then running a rule using an interpreter, and allowing the user to say yes or no to each output tuple, and record their decisions for future regression testing. The second stage of the project will be to experiment with generating the input tuples automatically (eg. randomly), similar to 'Monkey Testing'. This project will require good Java skills, ability to write or modify Eclipse plugins, and an interest in compilers and interpreters.

Using gaming algorithms to minimize JUnit test suites
JUnit test suites are great, but can become too large, slow, and cumbersome over time. Companies would like to be able to remove tests that are redundant, and just keep a small set of tests that are as powerful as the original test suite. Jumble is a tool that can analyse tests and see which bugs each test can detect. This project will apply the A* (A-star) path-finding algorithm, which is commonly used in computer games, to walk through the maze of JUnit tests and choose a minimal set that still has good coverage of bugs.


 

COMP520, ENGG482, ENGG492 Projects

Please print out and complete the COMP520 BCMS Honours Project Selection Form PDF to select your COMP520 project for 2013. Some or all of these projects are also suitable for ENGG492/482—please talk to the supervisor to see whether this is the case for any particular project.


ANTHONY BLAKE

Reconfigurable computing
The recently released Zynq-7020 system-on-chip (SoC) combines a dual-core ARM Cortex-A9 with a fabric of reconfigurable logic. The fabric contains a large amount of general purpose reconfigurable logic, as well as 140 independent block RAMs and 220 hardware multipliers, making this a very powerful device. Applications such as facial recognition are difficult to compute in real-time on embedded devices, but the Zynq can meet these difficult performance constraints. A high-performance parallel memory FFT core with the fastest throughput per area has been developed here at Waikato, and in this project the student will use this core to accelerate an application where the FFT is a critical component (e.g., facial recognition or video stabilisation). 

Run-time specialization
In applications where performance is a concern, it may be possible to realize a speedup by transforming a general function into a specialized function at run-time, once some parameters are fixed. For example, in image processing, the size of the image or the filters that will be applied may not be known at compile time, but at run-time, these parameters may be determined during initialization. At this point, the calculations that depend on the filter or the size of the image can be computed, and specialized machine-code for computing the remainder of the calculations is generated. The resulting code, which has been hard-coded for a specific filter, can then be run any number of times on different image data. In this area, a student could apply run-time specialization to a suitable application, and/or extend 'asmjit' to support ARM code generation.

 

JUDY BOWEN

Deriving formal models of interactive systems from requirements (also available to SE students in ENGG492)
Modelling safety-critical interactive devices before implementing them allows us to ensure that all aspects of functionality and interactivity are correct. Requirements, which form the basis of the initial design process as well as the models, may consist of a number of different documents and design artefacts at varying levels of formality. This project  will take the actual requirements developed by one of our industrial partners and investigate ways of extracting information from these documents which enable the creation of specifications and models which are guaranteed to be consistent with the requirements.

Context modelling of interactive medical systems (also available to SE students in ENGG492)
Interactive medical devices which dispense medication are safety-critical devices and modelling them is important because we want to ensure that all aspects (functionality, interactivity, documentation etc.) are correct and that they will behave as expected at all times. Sometimes it is important to understand not only how a device works and how users can interact with it, but also under what circumstances it will be used and what happens if those circumstances change. This project will involve developing models of the context of a medical device and examining ways of incorporating such models into an existing formal development process.

Using models of interactive medical systems to derive training materials (also available to SE students in ENGG492)
Interactive medical devices which dispense medication are safety-critical devices and modelling them is important because we want to ensure that all aspects (functionality, interactivity, documentation etc.) are correct and that they will behave as expected at all times. This project will look at ways of using models of such devices as the basis for developing user manuals and documentation which can be used to train practitioners

 

SALLY JO CUNNINGHAM

An improved recipe system
The searching and browsing interfaces to recipe collections such as Epicurious and AllRecipes are surprisingly limited--it is difficult, for example, to identify recipes that do NOT contain a specific ingredient, to specify a set of ingredients that the desired recipe will contain all or most of, and to limit searches by cooking technique.  Once an interesting recipe is identified, it is difficult to come to an understanding of the (sometimes extensive) comments and suggestions of previous cooks. 

This project entails building on the existing body of research (conducted at Waikato University and internationally) into the information needs and behaviours of hobbyist cooks, to design and develop an application that enhances the search and browsing facilities of an existing recipe website.

A better personal ebook collection manager
Current personal ebook managers (e.g., Kindle, Nook, etc apps) offer limited ways for the individual to organise her collection, and relatively impoverished metadata for individual ebooks.  This project will involve first adding to the existing research literature on how people organise their personal book collections (both physical and digital), and then using these insights to design and prototype an ebook manager.

 

EIBE FRANK

Alternating regression and model trees
Alternating decision trees are a well-known method for constructing highly accurate tree-based classifiers that predict a categorical attribute. They differ from simple decision trees in that they contain option nodes as well as decision nodes: at option nodes, all paths are taken, not just one. They are generally more accurate than decision trees and competitive with so-called "ensemble methods", where several classifiers (e.g. trees) are combined to form a prediction.

The goal of this project is to develop an algorithm for learning alternating regression and model trees, where the attribute to be predicted is a numeric one, and implement it in the Java-based WEKA machine learning software. The basic approach to be employed is additive regression, which has been successfully used to build alternating decision trees.

There are large collections of benchmark datasets that can be used to evaluate the accuracy of the method by comparing it to ensembles of regression and model trees.

This project is only available to students who have passed COMP316 or COMP321.

A tool for semi-automatically constructing FAQs from mailing list archives
FAQs are often a very helpful source of information on how to solve problems with a particular piece of software. In this project, the goal is to develop a tool that can be used to semi-automatically construct an FAQ based on a mailing list archive. The idea is to use natural language processing and machine learning to automatically identify frequently asked questions in the archive, as well as candidate answers that can be presented to the user for selection. The particular case study to be used is the WEKA software, which has a mailing list that contains thousands of messages from more than 10 years of use.

This project is only available to students who have passed COMP316 or COMP321.

 

ANNIKA HINZE

Text categorization and analysis based on document history
Documents are created daily by people, but they are not typically organized and stored in a sensible manner. There are duplicates with the same or a different name in various locations and slightly modified versions created for different audiences. There are folders that clearly subdivided by either document type or task and there are folders that are simply large bins of everything. A document’s history involves its creation, modification and ownership history, as well as whether or not it was included as an attachment to an email and the receivers of that email. The date stamp on the documents, the emails and wiki pages created during the same time are what constitutes and influences the document history timeline. This information is important for understanding the document’s purpose, its significance and relation to other documents. This research project will focus on investigating ways of deriving this information from documents and email archives and use it for detecting duplicates, as well as automatic categorization of documents into groups by topic and significance.

This project will be done in collaboration with Pingar (Auckland Software Company). They will provide an email archive, and the documents associated with it. Pingar can also provide text analytics on the document text, which can be used in combination with approaches based purely on document history.

1000 springs
This project is a collaboration with a company in New Zealand. We aim to classify 1000 springs into a large database (which has already been created). This project has several components, which each may form a separate thesis. We look into

Funding and scholarships are available for theses related to this project.

 

STEVE JONES

Improved QR Code detection and decoding
Quick Response (QR) Codes are becoming widely adopted as a way for smartphone users to capture snippets of data (such as a URL) with minimal effort. For this to work correctly a QR Code has to be located in an image provided by a device camera and then the QR pattern has to be decoded. The accuracy and speed with which this happens depends upon several factors such as image resolution, lighting conditions, the density of the QR pattern and the amount of error correction data in the pattern. This project will investigate how to improve upon the performance of state of the art techniques and widely used libraries (such as ZXing) for both detection and decoding.  Implementation will be in native code on the Android platform using devices such as the Samsung Galaxy S3. Therefore a familiarity with Java is desirable and you will need to be a strong C/C++ programmer.

What do people look at in mobile map applications?
Map applications (such as Google Maps) are commonly available on mobile devices. They provide a tiny viewport onto a massive information space (ie the earth) that can be altered by panning, scrolling and zooming operations. This project will investigate patterns in the visual focus of map application users and whether this has a relationship with their pan/scroll/zoom actions - perhaps we can predict where a user might navigate to based on what they look at. It will also consider how different visual cues for the position of off screen locations affect users' visual focus. The project will use an eye tracking system and mobile device, but does not require software development skills. It does require an interest in usability studies, experimental design and data analysis.

Generating interactive simulations of medical devices
Infusion pumps automatically dispense medication to patients at predefined dosages and intervals and so must not only function correctly but also ensure error-free use by the medical staff that interact with them. Interactive software simulations of such devices can be developed to investigate the usability of alternative user interface designs. This project will consider how simulations that run on a tablet computer can be automatically generated from formal models of device functionality and interaction. One output will be a tool that transforms models into functional interactive Android applications that simulate a device such as an infusion pump. Given that you will be targeting an Android device (such as a Galaxy Nexus tablet) you'll need to be confident with Java development and XML.

 

DORIS JUNG

Fine-tuning alerting systems filter
Alerting systems inform the user of information relevant to their information needs. Unlike databases they work with transient data, constantly entering the alerting system to be filtered against users’ information needs that have been registered with the system. The filtering algorithm is controlled by several parameters. These can be hard-coded or set by a user.

This project takes an existing paper prototype for setting these parameters by a user and realizes its implementation. For this implementation the student will take into account results that have been found by the supervisor and Annika Hinze in an evaluation. Ideally it integrates into an existing wider alerting system solution implemented in C#. Other technical solutions may, however, be negotiated.

The student will undertake an evaluation appropriate to assess the usability of the solution implemented.

Digitally prototyping a controller for multiple medical infusion pumps
The Formal Methods group - in collaboration with Waikato Hospital - has modelled a controller for medical infusion pumps. Some of the aims of this controller are to reduce patients’ fear by eliminating beeping of pumps and to enable nurses to monitor all pumps simultaneously. Paper prototypes have been developed along with corresponding formal models of both functional behaviour and the prototypes.

This project will take these existing paper prototypes and realize their implementation. The aim is to create a testable digital prototype. The project will be realized for an iPad or similar device.

The student will undertake appropriate user studies to assess the usability of the solution implemented.

 

TE TAKA KEEGAN

Referee reporter usability study
A recent COMP591 project developed an Android App that acted as a rugby referee reporter. While the main part of the mobile app was completed, it still requires considerable  (on field) user testing and the reporting component needs to be developed. This project would suit someone with experience (or a keen interest) in building mobile apps and usability.

Using eye tracking to determine multilingual usability
How do eye movements relate to usability in multilingual websites? This project seeks to find a correlations between eye tracking properties such fixations and eye movement and usability of multilingual websites. The project involves user studies of multilingual participants, participants that the candidate would need to have access to. The project is dependant on the department purchasing some eye tracking technology.

 

RYAN KO

Tracking data provenance in cloud computing environments
Data is the main asset in cloud computing environments. However, current technologies are not built for tracking data provenance across large distributed virtualised environments. As such, most industry solutions are currently unable to track-and-trace the end-to-end life cycles of data in an effective manner. This project will focus on building a kernel-based data-tracking tool that will enable the effective logging, collection and analysis of data-centric logs. Students with a strong personal mission in cyber security, and interest in virtualisation technologies and Linux kernel module/ Windows device driver programming are encouraged to apply. The student will be co-supervised by security research professionals from Hewlett-Packard Labs. Interested students may contact the supervisor at ryanko@acm.org. More info: URL.

A platform for data provenance analysis
This project looks at setting up a flexible platform that allows custom scripts to be used for crawling datasets from target web storages/pages for provenance research. The platform will also deal with interfacing with the backend storage and distributed processing engine for the data processing stage. The student is required to have a good understanding of web scripting and programming. Students can expect to pick up skills in databases and distributed processing frameworks such as Hadoop. Interested students may contact the supervisor at ryanko@acm.org.

Discovery of security vulnerabilities in cloud storage systems
This project investigates common causes of security breaches and vulnerabilities in cloud storage systems (e.g. Dropbox, Box.net, etc) and report breaches in these systems. The student is expected to have a high interest in cyber security.

Discovery of security vulnerabilities in mobile applications
This project investigates common causes of security breaches and vulnerabilities in mobile Web applications. reports breaches in these systems. The student is expected to have a high interest in cyber security.

 

ROBI MALIK

Extended finite-state machines
The formal methods group at Waikato in collaboration with Chalmers University of Technology in Göteborg, Sweden, is developing WATERS, the Waikato Analysis Toolkit for Events in Reactive System. The software includes a user-friendly graphical editor for finite-state automata models and several tools for the analysis of large finite-state systems.

Extended finite-state machines (EFA) are like ordinary automata, but with the addition of variables and assignments to facilitate the modelling of systems with data. While WATERS allows the editing of EFA, their analysis is only rudimentarily supported by translation into ordinary automata. In this project, we will design and implement improved data structures to allow EFA to be translated and analysed more directly and more efficiently.

Compositional verification
Building a model checker such as WATERS also requires the design of algorithms that can cope with large finite-state machine models, often consisting of millions or billions of states. One approach currently investigated to cope with this complexity is compositional verification, and there are several projects possible to improve on compositional verification algorithms.

As one example, all compositional verification algorithms presently available in WATERS are based on so-called local actions. If some action is used in only one automaton of a large system, it is local and can be removed from the model, which allows the model to be simplified. However, it is also possible to perform simplication based on actions that are not local, which leads to an interesting project in compositional verification.

Such a project involves the modification and extension of the existing compositional verification algorithm in WATERS, programming in Java. Furthermore, it involves the evaluation of the performance of the modified algorithms using several large benchmark models that are also available in WATERS.

Supervisor synthesis
Synthesis is a technology that allows the automatic generation of a control program from a description of the problem that it is to solve. For example, a control specification for a reactor may consist of a description in the form of finite-state automata of the available sensors, heaters, and valves, plus the requirement to keep the reactor temperature and pressure within certain bounds. From this input, we can automatically synthesise a so-called supervisor that controls the heaters and valves safely such that all requirements are satisfied.

The present implementation of the supervisor synthesis algorithm in WATERS is written in Java and can compute supervisor automata with up to one million states. In this project, we will replace it by a C++ implementation that represents the output more compactly and hopefully can compute much larger supervisors.

 

MIKE MAYO

Detection of support/resistance lines in financial data
There is plenty of anecdotal evidence that market price series are not entirely random but respect certain levels called "support" and "resistance" levels. This project will explore whether it is possible to detect such levels in intraday data using machine learning. A new method for detecting levels (based on evolutionary algorithms) has been developed recently, and whoever takes this project could either extend that method or come up with an entirely new method. Students interested in this project should also be enrolled in COMP556 and have preferably also passed COMP316.

Object detection using a Boosted Cascade of simple features
There is a well known and powerful algorithm for face detection called a Boosted Cascade. It essentially calculates thousands of features from an image, picks a handful of the best ones, and then uses those features to build a detector that can scan images to search for objects. Its original application was face detection. This project is to implement this detector in Java, re-using Weka classes wherever possible, and to test the algorithm on other types of object besides faces.

Entity detection in semi-structured documents
Semi-structured documents are documents in which the graphics (e.g. fonts, text sizes, logos, layout, and colour) provide clues as to the meaning and significance of the content of the document. For example, a typical invoice will likely have sections for the company, the recipient of the services/goods, and the items being invoiced for. To understand a semi-structured document like this, the graphical structure of the document can be used to give clues as to the meaning of the text (which can be scanned via OCR). The main problem is that document layouts, even for documents as common as invoices, vary greatly.

This 520 project will be about using image processing and machine intelligence to try and recognise significant and meaningful parts of documents from the document's graphical features as well as its OCR'ed text. Interested students should have interest in image processing and AI. This project is a collaboration with a Hamilton start-up company called Pingu.

 

RICHARD NELSON

High resolution active testing
Active tests such as ping and traceroute have normally used in kernel timestamping of packets and have been limited to around a millisecond or so in resolution.  Recently some network interface cards have started providing hardware timestamping of packets in order to support the PTP (precision time protocol).

This project is to use hardware timestamping to develop more precise active network measurement tests and to investigate the accuracy limits and use of such tests.

Openflow
Openflow and SDN in general potentially allows new ways of thinking about network architectures.   A range of potential topics are possible. We have particular interest in service provider type networks.  This project is to improve upon existing switch implementations to develop features to improve the efficiency and security when supporting Ethernet/IP clients.

Annotating Network Intrusion systems
This project is part of a bigger piece of work to analyse network anomaly detections systems.  The idea is to use the output of network intrusion systems such as Snort and Bro as a baseline for assessment of the anomaly detection systems.  One problem to address is that Bro does not output all the required details to guarantee correct comparison so the first aim of the project is to improve the detailed output of Bro to allow it to be correctly used in a common annotation system.  The second aim is to use this output for comparison with Snort and a range of other anomaly detection sytems using tools already developed.

 

DAVE NICHOLS

A repository dashboard for Research Commons
Research Commons, the University's institutional research repository, distributes research works to web users. Currently the usage statistics are not user-centred and it is difficult for authors to get a clear picture of activity around their documents. This project will investigate what lecturers would like to see in terms of usage reports and then implement a solution. Usage data will be drawn from the logs of Research Commons. The project will involve Web technologies including JavaScript and XML-based APIs; it would also suit a student with an interest in human-computer interaction and information visualisation. http://researchcommons.waikato.ac.nz/

 

STEVE REEVES

Multi-path development for interactive systems (also available to SE students in ENGG492)
Interactive systems are increasingly required to run on a variety of different hardware platforms and operating systems. This project involves finding ways of using refinement (structured transition from a model of a system to an implementation) to support the development of a software system for multiple devices and ensuring correctness and consistency between the different versions.

 

SAM SARJANT (co-supervised with Cathy Legg)

Massive ontology interface
Wikipedia is a great resource for information, but everything is stored as plain text (or Wiki markup), and may include contradictory information. Recent (and ongoing) research has been able to extract information from Wikipedia and store it within a logically structured ontology of relational facts, but what good is data if people cannot interact with it? This COMP520 project involves developing an interface for viewing, querying, and annotating the ontology such that a user may simply pose queries to it and receive the answer that they require directly in a user-friendly manner, with the option to look 'behind-the-scenes' and view explanations for the answer obtained. 

Students will be working with the first-order logical language of the ontology and the interface that is developed will need to be viewable within a web browser.  User input will need to be converted into an equivalent query and the results should be presented to the user in a comprehensible and interactive format. The student is also encouraged to suggest and test additional knowledge mining heuristics for improving the ontology's quality.

 

SIMON SPACEY

Using Linux software to characterise OSX and Windows programs
3S is a program characterisation framework that works at the assembly level to analyse the internal workings of programs written in compilable languages. 3S is currently available for Linux only, however x86 assembly is, largely, OS independent so porting 3S to OSX and Windows for commercial use should be relatively easy.

The aim of this project is to get 3S instrumenting software on OSX and Windows Machines. There are three stages depending on your progress: 1, get 3S working on OSX with gcc, 2, get 3S working on Windows with the gcc in the Cygwin environment and 3, get 3S working with Microsoft's own compiler in the standard Windows environment. In doing this you will strengthen (and have evidence of) your skills in: the 3S characterisation framework, Unix tools, x86 assembly, gcc and gas compiler options, Python, C/C++, software benchmarks, Cygwin and the inner workings of Linux, OSX and Windows executables. Depending on your approach/progress you may also have demonstrated skills in commercial software written in Objective C, C++, MFC, ATL and/ or WPF as well as alternate characterisation frameworks such as vTune.

The report should explain any changes required to the 3S framework, list the dependencies required for the environments and consider if a Universal Patch can be applied to 3S to make the same code base work on Linux, OSX and the two Windows environments. Additionally you should prove the OS migrations were sound by providing 3S characterisation results for the same benchmarks on the systems commenting on any differences. If time is available, you can also compare 3S tool results against OSX and Windows alternates and comment on the advantages/ disadvantages, limits and constraints of the different frameworks.

A GPU simulator for guiding code-sign design effort
3S is a program characterisation framework that works at the assembly level to analyse programs written in compilable languages. 3S has unique tools that analyse the internal control and data flows of programs which allow the estimation of program performance on different computational architectures.

The aim of this project is to create a 3S tool that can provide characterisation figures to estimate the performance of standard software code sections if they were executed on a GPU. To achieve this goal you are going to have to: 1, understand the inner workings of GPUs to abstract out critical hardware and software characteristics, 2, develop an execution timing equation similar to the ones proposed in the Write-Only Architecture paper for ILP, 3, create a 3S tool to measure the software characteristics required to generate timing estimates. In doing this you will strengthen (and have evidence of) your skills in: the 3S characterisation framework, Unix tools, x86 assembly, gcc and gas compiler options, Python, C/C++, software benchmarks and the inner workings of heterogeneous architectures, GPUs and, depending on your approach/progress, GPU programming languages such as CUDA and OpenGL.

The report should detail GPU chips and communication models from different vendors, provide abstract characteristics and models and explain the 3S tool and any framework changes required to gather the software information required for the model. Additionally you should provide GPU timing measurements for MiBench benchmarks and compare them against the FPU timings provided in previous work (max, averages and mins will be sufficient) explaining any obvious anomalies and trends. If time is available, you could compare your performance estimate results against actual implementations for a real GPU.

 

CRAIG TAUBE-SCHOCK

Natural speech synthesizer
A speech synthesizer that models the physical characteristics of the human vocal tract has been developed. This synthesizer has the potential to produce speech that sounds much more natural than existing synthesizers, but is limited by its current control system. Because the synthesizer models the human articulatory system, we hypothesize that the control parameters for the synthesizer can be computed using a model of how humans move their jaw, tongue and the muscles in their mouth and neck. In this project, the student will formulate a model of the human vocal apparatus and use it to control the articulatory speech synthesizer.

 

MARK UTTING

Monkey testing for JStar
JStar is a declarative Java-based parallel programming language that is being developed at Waikato University.

Each JStar rule has a clearly defined set of input tuples and output tuples, which means that we can invent new ways of writing unit tests for rules. This project will explore ways of expressing sets of input tuples, then running a rule using an interpreter, and allowing the user to say yes or no to each output tuple, and record their decisions for future regression testing. The second stage of the project will be to experiment with generating the input tuples automatically (eg. randomly), similar to 'Monkey Testing'. This project will require good Java skills, ability to write or modify Eclipse plugins, and an interest in compilers and interpreters.

Using gaming algorithms to minimize JUnit test suites
JUnit test suites are great, but can become too large, slow, and cumbersome over time. Companies would like to be able to remove tests that are redundant, and just keep a small set of tests that are as powerful as the original test suite. Jumble is a tool that can analyse tests and see which bugs each test can detect. This project will apply the A* (A-star) path-finding algorithm, which is commonly used in computer games, to walk through the maze of JUnit tests and choose a minimal set that still has good coverage of bugs.

 

IAN WITTEN

Jigsaw puzzle helper app
This project will produce a software helper for physical jigsaw puzzles. Many people find jigsaw puzzles an interesting,  rewarding, and social activity that frequently extends, on and off, over several days during a holiday period. Some parts can be difficult and frustrating, however -- typically sky and other uniformly coloured areas. This project will produce an automated helper that analyzes pictures of some region of a partly-completed jigsaw puzzle, and a set of miscellaneous left-over pieces, and perhaps the scene on the top of the box, and make suggestions as to which pieces should go where to fill in some of the gaps.

The project poses interesting problems in image recognition, shape matching of interlocking pieces against gaps, and colour/texture/pattern matching of picture fragments with neighbouring tiles and the box-top scene. Ideally the system would be a smartphone app that analyzes pictures taken by the user with the phone's camera.

Learning French and German (co-supervised with Shaoqun Wu)
FLAX (Flexible language acquisition) is an open source system that we have developed that uses digital library and other web content to automate the production and delivery of practice exercises for overseas students who are learning English. The exercises involve students in a virtually endless supply of collaborative and competitive language activities that are interesting, compelling, and rewarding. URL: Flexible Language Acquisition project.

This project will extend FLAX for teaching French and German. Currently, it only incorporates a part-of-speech tagger for English, which means that although it will work with other languages to some extent, some of the facilities it offers will be degraded. This project will locate open source taggers for French and German and add them to FLAX, and make any other adjustments that are necessary for it to work well on these languages. It will also identify special problems that students encounter when learning these languages, and design exercises targeted at these issues. (This follows on from a successful 2012 project that extended FLAX to teach the Spanish language.)

 

SHAOQUN WU

Learning Chinese with FLAX
FLAX (Flexible language acquisition) is an open source system that we have developed for teaching and learning a second language such as English or Spanish (and soon German and French). URL: Flexible Language Acquisition project.

This project will build and incorporate a component to support Chinese language learning. It will be built upon the FLAX infrastructure, and comprise a parser that analyses Chinese text, along with a set of language activities. The Fudan NLP tool will be used to segment Chinese text and extract syntactic structures. The will make existing FLAX activities work with the Chinese language, and implement new ones that are particularly pertinent to Chinese.

Text-to-speech tool for FLAX
FLAX (Flexible language acquisition) is an open source system that we have developed for teaching and learning a second language such as English or Spanish (and soon German and French). URL: Flexible Language Acquisition project.

This project involves integrating an open source English text-to-speech tool into FLAX that can read out the text naturally. The tool needs to support different voices e.g. male and female, and American and British accent. The student will investigate existing text-to-speech tools on the Web and identify a suitable one for FLAX. The project will use Java, XML, CSS, JavaScript, Ajax, and other Web programming technologies.


 

COMP591 Projects

COMP520 projects are also available for COMP591 students - choose from the list of projects above. Please print out and complete the COMP591 BSc Honours Project Selection Form PDF to select your COMP591 project for 2013.

NOTE: If you are enrolled in a Postgraduate Diploma in Computer Science (PGDipCompSci) you also need to make sure the PGDipCompSci Coordinator has completed a PGDipCompSci Form outlining the start and end dates of your PGDipCompSci before selecting a COMP591 project.



Last updated 25 February 2013