COMPX241 Project List

COMPX241 Emerging Project Ideas

Hey You! Interact with me!!

Meeting time: 4-5pm

Project Manager: Lindsen Cruz

Team Members: Blake Akapita; Daniel Bartley; Rex Pan; Michael Young

Key Idea: develop an information display that provides convenient but novel ways to interact with the display.

With a suitable app installed on a person's phone, you could for example:

Utilize the Mosquito sounds effect to achieve communication between display and phone. For an example of this sort of use see Google Tone Chrome extension .
Make use of Blue-tooth iBeacons (such as estimote) to determine the position of the phone. Moving the phone about causes interaction with the display.

For options that don't even need any phone app installation, there is for example the option of:

Using a Kinect with the display to determine a user's distance from the display and their basic skeletal posture. Moving hands and legs, open or closing hands can be used to interact with the display. Even more fun is two or more people have to co-ordinate to affect the change they want.
Something similar (but perhaps more modest) could be developed using a web camera combined with image-processing.

Note possible link-up with University App below.

Framed

Meeting time: 4-5pm

Project Manager: Jordan Schroder

Team Members: Isaac Higgins; Michaela Kerr; Cameron Paul; Niranjana Sethu

... or as a longer title, In the Frame.

Key Idea: Use photogrammetry to stitch together CCTV footage into a 3D model of, say, a city centre. Different 3D models for different moments in time. Interface lets you move around within the space and adjust time to assess what happened.

Keywords: Video Streaming; Video and Image manipulation; Web Technologies

The situation that this project is looking to help with is in the reviewing of CCTV surveillance cameras taken from a public space (for instance a city square, monitored by the local council). When an event has occurred (such as a bag snatch) that the council—or the police for that matter—want to review, then Framed is a software tool designed to help them do this in an intuitive manner, by letting the operator of the CCTV feeds virtually place themselves at a location in the city and review what was captured by the cameras, in situ.

To expand upon the idea: rather than present the traditional tabulated 'wall' (2D grid) of the video feeds the organization has, and expect the user to figure out how they relate to each other, the core idea to this project is to provide a 3D space and allow the operator to virtually choose where they want stand in the environment being captured by CCTV, and then run the recorded video over the time-period in question. In the first iteration of the project, the different views of the captured video are projected into the 3D environment, appearing a bit like movie screens positioned and oriented, in the 3D environment, in such a way that the 2D movies that are shown spatially make sense.

With the concept of the project established, there are numerous ways to evolve this project. The background environment could perhaps be provided by using a mash-up with StreetView, for example; or it could be a full 3D model (drawing upon Google SketchUp?); or some entirely different approach, such as using the SIFT algorithm to stitch images (and by association video) content together. One thing is for sure, however you approach the project, you'll want a keen grasp of 3D Geometry and 3D-to-2D Projections

Potentially useful Links:

OpenMVG: Open Multiple View Geometry library
OpenMVS:Open Multi-View Stereo reconstruction library
Teddy Bear worked example (using OpenMVG and OpenMVS)
Using VirtualBox running a Linux distribution such as Ubuntu recommended if compiling source code from scratch.

Haven't I been here before?

Meeting time: 4-5pm

Project Manager: Tristan Anderson

Team Members: Fajer Alblooshy; Liam Hennessy; Michael Irvine; Lysa Phan

Key Idea: Develop a system that cut's out repetitive typing and clicking around a desktop environment, such as the 'same old tasks' you find yourself doing when logging in.

Keywords: Graphical Environment Scripting; Multi-platform OS

Computers are meant to be good a repetitive tasks, right? So why is it numerous times a week I find myself at the start of a lecture doing the same thing: after logging in I start up Panopto for video recording (providing log-in details); then I start up a browser (providing log-in details again, this time to the web proxy-server); and head to the Moodle site for the course (more log-in credentials needed); finally I access Google Drive and navigate my way to where the PowerPoint slides for the lecture are located.

Another repetitive situation is after security updates on my Windows laptop (with its obligatory reboot). There is a fairly set sequence of things I go through after this, to get things the way I find most useful. Not always the same though. If I'm at home what I do differs from what I might do at work. In particular at work, there is a finer level of granularity to what I do, as there are various research projects I'm involved with, each with a particular set of applications and command-line windows to open to particular places and environment variables set. Some even involve being remotely logged into other computers. When I switch to working on one of these projects, the first thing I need to do is go through the "same old routine" to get things going. There's even the catch that if I haven't done this for a while, for a given project, I might not even remember all the steps necessary. In the case of an experimental branch of the spatial hypermedia project, Expeditee, I need to remember to launch Eclipse from the command-line, having first set some environment variables. The environment variables to set are all sitting nicely in a script file to run, but if I don't remember to do this, I have to quit Eclipse and beginning the start-up procedure again. For the music digital library work I do with Illinois University, the work involves spinning up up web services on a couple of servers located in the US. And so on ...

The aim of this project is to develop a script solution that can be used to capture such setup sequences: both graphical and keyboard input. Exploring different approaches, and assessing their various strengths and weakness would be the starting point. The ideal end-result would be a solution that can operate across all the main operating systems (Windows, MacOS, Linux). This could be achieved by some slight of hand (separate software solutions developed for each OS), or else solved at a more fundamental level, for example, "fooling" the computer into thinking what the software solution is, is actually a keyboard and mouse that is plugged into the computer generating a stream of events for it to follow.

For something that is highly likely to have sensitive information, such as passwords, stored in it, then security and/or encryption of data will be an important aspect the the project. I quite like the idea of having the solution in some sort of handy, portable, tangible form: say my phone. That way I'll have the graphical scriptable ability wherever I go. Just plug it in, and have it show me a list of scriptable options sorted into an ranked order, based on location and time of day.

It's unlikely that a person will get the sequence of mouse moves and keyboard input exactly right when in recording mode, and so some sort of editor capability should be included. Maybe model a script as a series of segments that can be individually tweaked, overwritten, etc. Robustness to things like a script being played back in an environment with a different screen resolution should be factored in. As should labelling the scripts (and segments) to make it easy for the user to access them.

Some starting places for you to consider for potential solutions are:

USB HID device driver
Android Keyboard Gadget project on GitHub.
Wikipedia's list of GUI testing tools
Selenium web browser automation

University App: It's like being on holiday?

Meeting time: 4-5pm

Project Manager: Catherine Siriett

Team Members: Callum Herbert; Jane Tian; Daniel Wheeler; Benjamin Wheeler

Key Idea: Develop a bespoke mobile-phone travel app for families with kids travelling abroad.

Keywords: Mobile App Development; GPS; Social/Crowd-sourcing;

Note: The version of this project that is actually going to run is an app specifically targeted for new students attending Waikato University. Some ideas from When in Roma might carry across, but I not expecting all the ideas in the description to remain relevant.

In more detail: the university had an app for Orientation Week this year, but as noted in class its location ability only worked when outside. Can this be improved upon? What other student-centred features could the next generation of this sort of app have? I would argue you're the best people to work this out. Only your imagination is the limit!!

Given that the Hey You! Interact with me!! project is going ahead, with the intention of making our display screen around campus be more interactive, then there could very well be an benefit in the two teams co-operating so the University App could be tied in with what's happening with interactive displays.

For this project I imagine two strong phases to the software app developed: Explorer mode and Wisdom of the Crowd. To be honest, when in Explorer mode, the app isn't that supportive—but that's OK, as you the intrepid explorer! What it's doing though, is running GPS the whole time, and paying attention to when you seem to spend a lot of time in one location There's usually a reason for this, either good or bad, for such a "hotspot". Maybe you were figuring out how best to get into the city centre from the airport. Or perhaps you stopped at a cafe (was it any good?), or were viewing one of the sights to see.

Having run your app in Explorer mode, at the end of the day, when you plug in to your laptop (say), it shows you these hotspots on a maps and asks you to enter some information to explain what was happening, which it stores centrally. The enriched information that is built up by the explorers feeds the Wisdom of the Crowd side of the app. In this latter mode when you find yourself at the airport, it vibrates to let you know there is information potentially relevant to where you are that it can show you.

An added twist to the app in Explorer mode is that it let's you take photos, and/or is integrated with the GPS locations of photos you have been taking during the day. These might be useful to show someone using the Wisdom of the Crowd side of the app to help that user orientate themselves.

Unfortunately we don't have the budget to send you to Rome (see project title) to trial the software you develop, however the ideas expressed in this project work equally well when applied to the idea of someone new to our university's campus (but that wouldn't have given us such a catchy title!)

This last point ties aligns quite nicely with an idea expressed in class to develop a mobile app for our campus inspired by the O-week app for getting around campus, but also works inside the building (suggesting the use of iBeacons).

DIY On Yer Bike!

Meeting time: 5-6pm

Project Manager: George Hewlett

Team Members: Jardine Chapman; Corbyn Noble-May; Daniel Martin; Denzel Belbin

Key Idea: Liven up your time on an exercise bike. Use tablet and/or TV to show you interesting locations (real or imaginary) tied to how you are cycling, or else gamify things.

Keywords: Sensors, Android, Chrome-case, Visualization

If enough interest is expressed in this project than I will locally source an inexpensive exercycle from the Hamilton area. It's up to you to then figure out how to make the time passed pedalling away more interesting. Equipment-wise, into the mix I would also be prepared to throw in a LifeBeam SmartHat (again with the proviso that enough interest is shown in using it in the project). It should be possible to rustle up a fitbit as well. Android phones and tablets can be checked out from the faculty's mobile device pool.

A key challenge to the project is getting all these different pieces to work together. You might like to use the phone to monitor how fast the spinning front wheel is going (make this easier by attaching a small bright rectangular piece of paper as a fiducial marker), or how about strapping the phone on to the leg of the person cycling? (We have found a jogger's armband for a phone provides a simple way to achieve this.) If it's easy to safely fix the tablet to the handle bars then that would make a good display area. Alternatively, you could explore the option of chrome-casting to a TV: then you could perhaps do away with the tablet altogether, and make the phone the central point for logistics. Potentially useful links:

OpenCV (would be a good basis to track a moving marker, to count revolutions)
Step Detector Sensor in Android
Chromecast API
An Experimental Open Source project to get live data from a fitbit

Musical Poetry in Motion

Meeting time: 5-6pm

Project Manager: Alex Merz

Team Members: Braxton Ah Chee; Taran Kern; Esther Ngamata; Enej Ranzinger;

Key Idea: Develop a tool that makes it easy to produce the sort of high quality kinetic typography video increasingly associated with the pop songs, such as Perfect (Ed Sheeran, Official Lyrics Video) and fan-based ones such as Happy (Pharrell Williams) and Roar (Katy Perry).

On this occasion, the key idea description pretty much nails what this project is about! My vision for this project is the development of a tool that greatly simplifies the task of converting the starting point of an audio recording of a song accompanied by a text file with the lyrics in it, into a kinetic typography video.

To support the editing and tweak of the video being produced, I would advocate for the development of an HTML5 based environment that utilizes features such as canvas and svg elements and exploits animation rendering. This would be developed inconjuction with an export feature to produce an MP4 video for downloading.

Some very primitive examples that play with these concepts are as follows:

Cell Block HTML

Meeting time: 5-6pm

Project Manager: Vladimir Ilic

Team Members: Salim Al Farsi; Lloyd Molina; Noelle Kyla Rubio; Zak Temperton

Key Idea: Fully support the display of HTML in a spreadsheet, this being an enabler to providing an easily accessible platform for performing text analysis (with out resorting to programming directly in a language such as Java, Python).

Keywords: Text Analysis; Page Scraping; Seamless Web Editing

Sure spreadsheet applications such as Excel and Google Sheets allow you to enter text into cells, in addition to numbers, but the dirty secret of the spreadsheet world is that their role is denigrated to be little more than immutable labels. When you stop to think about it, all that power of computation people associate with these applications is just for the numbers! Text, go take a running jump!!

This is not to say that spreadsheet applications are not useful. Far from it. They provide a low-cost entry-point for many people to perform a generically configurable set of computational tasks without the need to learn how to code in a general purpose programming language—as long as your starting data happens to be numeric.

But computations don't always start with numbers. What about a journalist interesting in working out how many times Donald Trump has tweated using the term Crooked Hillary. There's clearly a calculation there to be done, and they can easily copy a series of tweets, and paste them into a spreadsheet, and get them to appear one per line. But what then? It's hard in a spreadsheet to easily go any further than this.

Are the makers of spreadsheet applications missing a trick? It feels like there should be easy ways to achieve this sort of computation within a spreadsheet environment. And by easy, I don't mean something like:

	=(LEN(B2)-LEN(SUBSTITUTE(B2,"Crooked Hilary","")))/LEN("Crooked Hilary")

where B2 holds a cell containing one tweak, which is the approach Microsoft suggests.

This observation motivates this project, where it's aim is extend the tabulated cell approach of spreadsheets to better support text analysis.

To develop spreadsheets in this way, a core issue to figure out is how to go beyond the atomic treatment these applications make of a value in a cell. Notably, representing HTML directly is poorly supported in Microsoft Excel, and perhaps even more surprisingly (ironically!) in web-based spreadsheet products such as Google Sheets.

As a starting point there are plenty of JavaScript libraries that provide open source web-based spreadsheet applications, such as those discussed on Quora. They embody the usual thinking about numeric computation. See for example, EtherCalc.

Current best practice for text analysis within Excel is very primitive, for example:

How to get some basic text Analysis using Excel going out of the box.

Other things to consider:

Being able to enter and edit HTML directly within a cell (giving rise to the name of the project).
Being able to import/page scrape from semi-structured sites

For some home-grown work on Seamless Web Editing see our Seaweed project, the ideas of which can now be seen in HTML editors such as ckeditor

The title for this project is a pun on the Australian TV series Prisoner: Cell Block H but in no other way is connected to it!

Something You Can Count On

Meeting time: 5-6pm

Project Manager: Ward Beehre

Team Members: Morgan Dally; Isaac Mackenzie; Cameron Simmonds; Daniel Stokes

Key Idea: develop a web site that let's a user locate objects of interest to them in a set of photos.

Keywords: Deep Learning; Convolutional Neural Networks; Image Processing

To expand upon the key idea: imagine a web site that allows a user to upload a set of images, enables them to tag some examples of the things that they are interested in (e.g., kiwifruit amongst the vine's leaves—a real world example), and then they press the process button.

Now take a look at the following video:

Real-time Object identification in Bond Movie (Skyfall?)

Impressive right? This is an example of Deep Learning, and the idea in the proposed project is to utilize a technique from this field called Transfer Learning to achieve this sort of level of object location detection, customized to what the user is interested in. The approach works by utilising a large-scale deep learning network that has already been pre-trained (on a large array of images, in the case we're interested in). To this, a final layer is added through training on a more specific set of objects (those of a type that is the particular interest of our user, such as kiwifruit), to produce the actual network used to detect objects in hither-too unseen images.

Existing expertise in the department lies with utilizing: Microsoft Research's Cognitive Toolkit and applying transfer learning to Oxford University's VGG deep learning model.

Potentially useful links: