COMPX241 Project List

For Later, Dude (Let's talk about this at another time)

Project Manager: William Burton

Team Members: Cymone Jacob, Finn Welham, Zhibo Xu

Weekly Update Time: Wed 1-2pm

Equipment on loan: USB dictaphone pen, watch, and waterproof microphone; also a standard issue dictaphone device

Key Idea: Make it as easy as possible to record yourself, wherever you are, speaking (ideas, instructions, etc.) that will later on be interpreted with an online Siri-type application.

Keywords: Automatic Speech Recognition (ASR); Deep Learning; Knowledge Representation

Imagine the scenario: I'm driving to work when a song comes on the radio. After a few bars into the song I remember that this is a song I've been meaning to find out the name of—trouble is the DJ has already said upfront what the song was and I wasn't listening that closely. In theory I could get out my phone, enter its PIN, scroll through the apps to find Spotify, and then get it to sample the song that is playing. That's enough of a pain to do if you're a passenger in the car, and with all that faffing around the song might have very well finished in the meantime ... but remember I'm the driver! Yikes!!

Think how more convenient it would be if I had, say, a pen clipped into my shirt pocket that's also a dictaphone: I could slide it out, click the record button and say For Later Dude, find out what song this is and then let the pen/dictaphone record some of the audio that's playing. Subsequently, when I'm at my desk at work, I plug in the dictaphone pen (it's also a USB thumbdrive, don't ya know!), and the Later Dude software I have installed on my PC scans the device for new content, which it then applies Speech-to-Text software to, and actions the result.

I now know the name of that song playing on the radio ... have changed the time of a scheduled meeting in my calendar ... and added into my Project Ideas doc for COMPX241 this new nifty idea I had, while making breakfast, about making it easy to record audio wherever you happen to be, and then later on have it processed!

As an additional thought, to round out the idea, given that the recording device acts as a USB thumbdrive, then the "For Later, Dude" code that you write for processing the audio can be stored on that same USB drive. With a bit of care, you should even be able to set things up so the processing software can run on whatever host machine it is plugged into, be it Windows, MacOS or Linux. Maybe you could even get Android in your sights, as a phone that support OTG (On-The-Go), which is fairly common these days, will allow for USB devices to be plugged in to it (i.e., the phone, the tablet), and let the plugged in device appear as a disk.

Potential frameworks, tools and API to build upon:

Mozilla's Deep Speech for Speech-to-Text recognition
The Open Source voice assistant platform: Mycroft
If working with Mycroft, then its skills plugin component looks like a promising way to approach supporting niche tasks such as my "what is this song that is play" example.

Kiwi Kluedo: The Yeah, Nah Edition

Project Manager: Reef Proctor

Team Members: Max Bean, Nic Maultsaid, Abbie Reid, Melissa Williams

Weekly Update Time: Wed 1-2pm

Key Idea: Create a New Zealand themed version of the board game Cluedo, where emphasis is placed on accentuating dynamic aspects of the game: the number of characters and who the characters are; similarly for locations and murder weapons.

Keywords: 2D Graphics; Client-Server Software Architecture

Classic board games such as Cluedo are enjoyed by countless people: their game-play is what has helped make them ... well ... classic! But they can get a bit "samey" after a while. Static. When was the last time you actually played a game of Cluedo (or Monopoly, or Battleships)?

Moving things into a digital realm can really open things up in terms of shifting elements of such games into a more dynamic space. Based around a digital board, what is on the board can be changed or adapted: either creating something new when the game is started (and then stays in that configuration for the duration of that game); or else more adventurously leads to changes in the board as the game proceeds. Similarly for the cards in play and/or number of players allowed, these can go beyond what its physical counterpart can support.

This is the idea that is at the core of this project. How you use these—and other observations you come up with—is up to you. The key thing is to think about how they can be used to drive useful/innovative/fun features of the game that make it less samey. For example, asking the players at the start of the game how long they are looking to play the game for could lead to some decisions over how the board and pieces are configured. Think about the classic version of the game: are there any features that are a bit irksome? Or make for rather stilted play? Everyone I know who plays Cluedo maxes out on the strategy—when naming a place, a location and a murder weapon—of having two of those three items in their hand. Recast in the digital realm means there's an opportunity to address and go beyond aspects of the game that are likely more a result of its physical form, rather than being an element the game-designer wanted.

So other than Kiwi Kluedo: The Yeah, Nah Edition sets the scene, the floor ... or perhaps that should be the board (??) ... is all yours!

Ideas for Weapons:

Steel capped Gumboot
Broken bottle of Tui
Buzzy Bee (slipped on)
Large Chilly Bin
Poisoned Sauvignon Blanc from Marlborough
Smothered/Suffocated by Pavlova
Electrocuted by ... an Electric Fence
Strangled by No. 8 Fencing Wire or a Bungee Cord
Crushed by a box-set of LOTR DVDs (the Director's Cut)
Sat on by an All Black

Ideas for Locations:

Batch
Swimming pool
Sleepout
The Longdrop
By the trampoline
Tent/Campsite
Sheep paddock
Cow paddock
Shed
Farm House
Barn
Milk Shed
Beach front
Placings 10 (a popular hardware store)

Ideas for Characters:

Farmer Brown
Daisy the Cow
Sam the Shearer
Brad the Surfer Dude
Sharon the Campsite Owner
Krystal the Lifeguard
Kev the Tradie
Hemi the Tour Operator
Peter the over enthusiast Tolkien fan

A client-server software architecture is a strong contender for how to develop the functionality needed in a multiplayer boardgame, where there are cards that are only for a single player's eyes. An even stronger contender with regards to versatility, then, would be to implement the project in a web-based environment using WebSockets.

Useful Links:

WebSockets API

An introduction to WebSockets
Socket.IO (Javascript library that provides both client and server side web sockets)
A boardgame engine for engine for creating turn-based games using JavaScript
A list of boardgame resources on github

Zooming Through the Fourth Wall

Project Manager: Kyle Cree

Team Members: Caleb Archer, Hannah Carino, Luke Finlayson, Isaac Boielle

Weekly Update Time: Wed 1-2pm

Key Idea: Have more control over your perspective on a Zoom/Zoom-like call.

Keywords: Web Development; Video/Image/Audio processing

We've all been using video conferencing platforms, such as Zoom, a lot more than we did a couple of years ago. I'll confess that I've not always been 100% engaged with all the calls I have been on in that time. Sometimes my mind has drifted to features that could be in the video conferencing software, but aren't ... which provides the origins to this project idea.

It's not about you, it's all about me! The driving force for this project is thinking about what features you would benefit from having, not what some multinational company behind the product has decided for you. I mean, why is it in Zoom that messages posted in chat before you join a call aren't available to people who join later? Is it really that big a privacy issue? My preference would be to leave them in by default, but that you can press a clear button if you would like. That way you could still support the scenario of say a job interview, where people at the start of the call share details in chat—such as who is going to ask what question (those users are free to then copy those out and keep them in a text editor if they like—then clear those chat messages out before letting in the interviewee.

But let's step it up a notch, and really push on what can be done: in particular around what it means to view a video feed of someone else, be it their webcam shot or else their share-screen feed. Move things from you passively watching what is being displayed (which really can feel like you're watching some kind of movie), to something that lets you interact with what's there (to continue the analogy, breaking the fourth wall).

Here are some ideas of what that could look like:

When watching a share-screen of someone's presentation, and they have moved on to show the next slide, if you want to, you can press a "back" button to go back and see what was being displayed before (or else use a controller that lets you rewind back through that video feed).
Focusing in on a slide that is being shown on a share-screen, if it is showing text or hyperlink information, then you can rubber-band around that area of the screen and have the text there copied into your clipboard. If it was a hyperlink, then your browser opens that web-page.
If you're frustrated that someone else's webcam shot is a bit off kilter, say placing them to one side or else framing them too low in the shot, then you have some controls at your end that can change this!

Sounds a bit far-fetched I know, but can actually be done fairly easily with a bit of forethought in the video conferencing app that you're developing. Say when the other user is being shown their video shot, it highlights a zone that is 80% of the viewing area, then there are pixels to spare (as it were) outside of that central viewing area that will be part of the broadcast video, but don't have to be shown in the framed shot shown to you initially. Then if you decide the person is a bit to the left, or too low, then you use controls that you have at your end to change which part of the rectangular shot is being shown to you.

Of course this goes both ways, so the person seeing your shot could be doing much the same thing in reverse! One way to be upfront about what is going on is that, when a user is preparing to join a call, the video image that is shown to them accentuates the 80% region that is going to be transmitted by default, but also around that area (slightly greyed out) shows the remaining 20% of the image.

Useful Links:

WebRTC -- Real Time Communication
Socket.IO (Javascript library that provides both client and server side web sockets)
Building a Video Chat App with Node.js + Socket.io + WebRTC (Note this is only one of many you can find online)
Zoom also provides an API that might be worth looking at
For taking a snapshot of a video image, see this CodePen working example, or for a more step-by-step guide on how to do this, see this article.
Google Vision API (detect and OCR text in an image, more ...)

Hyperspace TV

Project Manager: Harrison Whiting

Team Members: Kyle Ananayo, Shean Danes Aton, Caleb Norrish, John Cocks

Weekly Update Time: Wed 2-3pm

Key Idea: Make use of a tablet as extra "real estate" to enrich your TV viewing experience. Different ideas emerge when different genres of shows are considered, such as the news, drama (say vintage movies), comedy, or (dare I suggest it) reality TV shows (watch out for spoilers).

The core to this project is providing an information-rich interactive user interface on the tablet. The challenge is how to access a disparate range of sources of information that are used to drive this interactive experience. Linked Data is a computer science technique developed precisely to address this sort of need.

Contrast trying to figure out in software what's useful to display to your user interested with the Harry Potter movies based on using a Google query like Harry Potter films with this Linked Data SPARQL (pronounced Sparkle!) query which returns crisp, clean information about the movies. You don't even have to worry about whether you should be using the term 'film' or 'movie', as you did when trying to glean pertinent information when attempting to do this with the Google query approach.
Talking of cleaner, crisper, machine-readable pages, here's an example of a linked data friendly page for Lord of the Rings: The Fellowship of the Rings

Depending on the team size it might be worth devolving into sub-groups that specialise in particular forms of television content, such as news, documentaries, and movies. Also, in the case of news, I rather like the idea that some speech recognition software could be running in the background analysing what the news show is saying, and bringing up potentially useful related information (maps for places mentioned, etc.) ready for the user to select.

Developing this using a Google Chromecast would seem like an excellent approach to take. Through last year's Hey You, Interact with Me project, I can provide guidance over how to have a setup where your phone/tablet can control what is displayed on the TV via Chromecast. It essentially comes down to creating a web page (HTML, CSS and JavaScript) that is displayed on the TV, and a web page that is displayed on the phone/tablet. Messages can then be passed back and forth between the table and TV using WebSockets resulting in changes in what is being displayed. See the Kiwi Kluedo project above for resources related to WebSockets.

Linked Data Resources:

An example based introduction to linked data
VizQuery has neat example of retrieving of pictures by self-portraits by Van Gogh at the VG Museum in Amsterdam
DataViz demonstrates how to take the linked data retrieved and transform it into a visualisation.
See WikiData for a fuller list of visualisation tools.
Don't for get to look at the photos of cats sample query accessible through the base WikiData query page.
Linked Jazz

Te 0AD

Project Manager: Thevinu Mathusinghe

Team Members: Michael Peddie, Jade Thomas, Ethan Thomson

Weekly Update Time: Wed 2-3pm

Key idea: A Māori centric extension to 0AD.

Keywords: 2D Graphics, 3D Graphics, (optionally) Web Technologies

A year or so back I attended a think-tank meeting in Christchurch focused on Māori aspirations in the digital realm. At the meeting, it was noted by some participants that you rarely see Māori people represented in video games, which must be having a profound effect on identity within the Māoridom. In considering the promotion of Māori culture specifically to their own tamariki—nevermind further afield—such an omission is particularly damaging as it is less about the game not including characters their kids would more readily identify with, rather it results in different role models being actively promoted instead. A similar observation was made about the values represented in a wide range of video games.

These observations have remained with me, and when I came across the Open Source 3D Real-time Strategy (RTS) game 0AD, modelled after the genre-defining game Age of Empires by Ensemble Studios, I saw the opportunity for a project that could do something about this. In 0AD, as in the original Age of Empires, there are many races a user can play in the game, but not the Māori people. There are many types of geographical regions you can play in, but not specifically Aotearoa. Different types of flora and fauna are available, but not forms indigenous to these shores.

The aim of this project, then, is to develop a NZ-centred version of the game to play. Not just in terms of the graphics provided, but also in terms of aspects to the value-system embodied in gameplay. In times of conflict the Māori people have a well earned reputation as fierce warriors. Less well known about, in the large, is their history as a trading nation and for innovation in agriculture. The good news is that 0AD has a plugin architecture so game mods can be made. This is the starting point for this Smoke and Mirrors project.

Useful Links:

Getting the source code
Instructions on compiling up the code base
These instructions were successfully completed on a 64-bit Ubuntu 21 distribution of Linux running using VirtualBox on a Windows 10 laptop. The VDI image providing the Ubuntu 21 distribution was downloaded from here .

Ye Olde Google Maps

Project Manager: Kenan Grant

Team Members: Aibel Antony, Ethan MacLeod, Hannah Murphy, Aryan Thanki

Weekly Update Time: Wed 2-3pm

Key idea: Develop a 2D geographical map app (akin to Google Maps or Open Street Map) that displays historical geographical data: the regions and city boundaries; further, take film footage (ideally archival footage of historic locations of interest) and utilise photogrammetry to develop 3D models.

Keywords: Interactive 2D Graphics, User-centred Design, Human Computer Interactions (HCI); 2D Image Processing, 3D model reconstruction

We've all become accustomed to how useful web-based mapping systems such as GoogleMaps and OpenStreetMap (OSM) are in terms of looking up information about a place before we go there. And then when we are in the place itself, these web-based mapping tools can assist again giving us in-context information that helps us relate where we are with regard to the map. The aim of Ye Olde OSM/Google Maps is to take a bit of a lateral sidestep step (or to be more accurate, take a lateral step back in historical time) to view historical geographical information.

Your mission, should you choose to accept, is to develop an environment to enhance a history scholar's ability to work with historic data, primarily through the overlaying of historical map and building plans, but also assisting with how place-names and districts have changed over time. If available, this could even be extended to include demographic data, sourced for example, from historic census data.

Some existing activity in this space is detailed in Historical Map Overlays for Google Maps/Earth. A key thing to establish early on in this project is an interesting, publically available set of history data that can be used. You don't have to take on the world: showing how things could work for a selected region of interest (within NZ, or elsewhere, such as the centre of London in the UK) would be more than sufficient.

In addition to the Ye Olde Google Map city plan layering idea above, there is the option in this project to explore the idea of what was initial pitched as a separate project: History 3D, where (archival) film footage is taken as source material to generate a sequence of photos which is them fed into a photogrammetry system that takes a series of photos that captures the same scene, and works out the relationship between these photos, from which a 3D model can be formed. With this capability in the mix, then Ye Olde Google Maps could then be augmented with Ye Olde Google Streetview.

For 2D mapping APIs, an in-exhaustive list includes:

Google Maps API
Open Street Map (OSM) API
Bing Maps API

For the 3D reconstruction aspect to this project (History 3D!), useful link to peruse are:

OpenMVG: Open Multiple View Geometry library
OpenMVS:Open Multi-View Stereo reconstruction library
Teddy Bear worked example (using OpenMVG and OpenMVS)
ISTI's Meshlab (for viewing end-result) and Alice Vision's Meshroom (full reconstruction workflow and viewer)
NVIDIA Research Turns 2D Photos Into 3D Scenes in the Blink of an A
Instant NeRF by NVIDIA Leverages AI to Turn 2D Photos Into 3D Scenes
instant NGP on GitHub
Using VirtualBox running a Linux distribution such as Ubuntu recommended if compiling source code from scratch.

COMPX241 Project Ideas

For Later, Dude (Let's talk about this at another time)

Kiwi Kluedo: The Yeah, Nah Edition

Zooming Through the Fourth Wall

Hyperspace TV

Te 0AD

Ye Olde Google Maps