Waikato University crest

Department of
Computer Science
Tari Rorohiko

Computing and Mathematical Sciences

2012 Seminars

Events Index

How difficult is a foreign-language document?

Michalis Vlachos
IBM Research, Zurich
Tuesday 11 December 2012

The web nowadays consists of large amounts of multilingual texts with overlapping content (news portals, reviews, blogs, RSS feeds, etc.). A typical web search may return documents in a language non-native to the user. How can one rank these documents based on their perceived comprehensibility (i.e., from easier to most difficult)? Our work addresses this question by providing metrics that estimate how difficult or common are the words that comprise a document. We take special consideration of language cognates, that is, words that are similar in two languages, which can significantly affect the understanding of a foreign-language text.

The proposed technique can be applied to:

  1. Language-aware personalization of the Web, or even for deciding when to translate a foreign document (i.e., only when it is deemed very difficult).
  2. For education and online bookstores, for recommending foreign books that are more appropriate to one's reading level of a foreign language.


A research-driven start-up: A story by an ex-Waikato PhD student

Alyona Medelyan
Chief Research Officer, Pingar
Tuesday 27 November 2012
Three years ago I graduated with a PhD from Waikato University. Three years ago I also joined Pingar as their employee number one. In the meantime, Pingar grew from a New Zealand start-up with an interesting idea to an international company of 50 employees with powerful software. What drives this growth is our passion to bring state-of-the-art academic research to end users, and some of this research was born right here, in the CS labs of the Waikato Uni.

In this talk I will give an overview of our research projects covering areas like text summarization, search user interfaces, entity and keyword extraction, taxonomy generation and semantic search. While some of these projects already have turned into successful products, others have never left the research lab, and I will explain why. Still others are just budding ideas, which require more research, but the preliminary outcomes are promising. The talk will provide an insight into how research can be sustained in a small company and where it may take us in the future.


FaceBook and the Welsh language – 3 brief views

Daniel Cunliffe
University of Glamorgan, Wales
Wednesday 19 September 2012
The Welsh language is a vulnerable minority language within Wales. The current situation regarding the language has many similarities to that of the Māori language in New Zealand. This talk will first consider what role information and communications technology may be playing in the maintenance or decline of the Welsh language. It will then briefly present three different studies of the Welsh language on Facebook, discussing both the methods used and the results. It will conclude with a couple of thoughts on how language policy makers and parents of Welsh-speaking children should respond to Facebook.


Opportunities in good, old-fashioned, AI

Michael Witbrock
Cycorp, Austin, Texas
Wednesday 5 September 2012
We probably don't yet have the computers capable of supporting human-level AI, but we're getting quite close. And we're developing powerful algorithms for Machine Learning, Language Understanding, and various kinds of inference (probabilistic, classification-based, inductive, abductive, analogical, deductive) as we do so. So where is "Good Old Fashioned AI" in all this? Especially in the Turing year, this question has two parts: What can we do now in making computers that are a bit like people? And, now that computers are really pretty fast, what can we do with techniques, like first-order representations and deduction, that characterised the early days of AI. The answer to the latter question is, quite a lot, including making some interesting steps towards meaningful Human Computer Collaboration.

In this talk, I will focus on elements of this progress at Cycorp, where a very broad set of pre-existing inferentially productive representations, extensive use of deductive inference, and a partial ability to map between logical and textual representations, sometimes interactively, is beginning to significantly enhance our ability to build broad-coverage, reasoning-based applications. I hope that some of that building will happen in NZ, and am very keen to discuss how that can happen as extensively as possible.


Metrics for openness

David Nichols
Department of Computer Science, The University of Waikato
Tuesday 14 August 2012
Metrics in information science have been largely based around publications and citations. The altmetrics proposal has highlighted that citations alone are inadequate for a holistic description of the impact of scholarly communication. This talk will present some further metrics to characterize research publications - emphasizing open access and open science.


Computing the fast Fourier transform on SIMD microprocessors

Anthony Blake
Department of Computer Science, The University of Waikato
Tuesday 19 June 2012
The problem of efficiently computing the discrete Fourier transform (DFT) is one of the most significant ones in all of digital signal processing, and perhaps applied numerical analysis. In 1990, it was estimated that Cray Research’s installed base of approximately 200 machines, each worth about USD$25 million, spent 40% of all CPU cycles computing the fast Fourier transform (FFT), and today the FFT is a critical component for a huge number of sound, image and video processing applications running on workstations and mobile devices such as phones and tablets.

In this seminar I will describe how to compute the FFT as fast as, and in many cases faster than, state of the art libraries such as Intel IPP, Apple Accelerate, FFTW and SPIRAL, using the once discredited conjugate-pair algorithm, meta-programming, memory locality and latency optimizations, and novel auto-vectorization techniques.


Patterns of change: Can modifiable software have high coupling?

Craig Taube-Schock
Department of Computer Science, The University of Waikato
Tuesday 5 June 2012
It is considered good software design practice to organize source code into modules and to favour within-module connections (cohesion) over between-module connections (coupling), leading to the oft-repeated maxim of 'high cohesion/low coupling'. But what really happens "in the wild?" Are developers able to avoid high coupling? Part 1 of this talk presents an empirical investigation of coupling in 97 open source systems written in Java. The results show that while developers generally avoid high coupling, it is not eliminated in all cases and these results hold for all examined systems. Part 2 of the talk discusses the evolutionary processes that lead to the observed structures, and examines whether the presence of high coupling leads to a "ripple effect" of change propagation through software systems as they evolve.


Improved grid integration of intermittent electricity generation using electric vehicles for storage: A simulation study

Paul Monigatti
Department of Computer Science, The University of Waikato
Tuesday 29 May 2012
This paper describes a simulation to establish the extent to which reliance on non-dispatchable energy sources, particularly wind generation, could in the future be extended beyond accepted norms, by utilizing the distributed battery capacity of an electric vehicle fleet for storage. The notion of exploiting the distributed battery capacity of an electric vehicle fleet as grid storage is not new. However, this simulation study specifically examines the potential impact of the idea in the New Zealand context. The simulation makes use of real and projected data in relation to vehicle usage, full potential wind generation capacity and availability, taking into account weather variation, and typical daily and seasonal patterns of electricity usage. It differs from previous studies in that it is based on individual vehicles, rather than a bulk battery model. At this stage, the simulation does not take into account local or regional flows. A more detailed analysis of these localized effects will follow in subsequent stages of the simulation work.


Dynamic component composition - vision vs reality

Jens Dietrich
School of Engineering and Advanced Technology (SEAT), Massey University, Palmerston North, NZ
Tuesday 22 May 2012
The vision of component-based software engineering is often described using the Lego block metaphor – complex applications are built by stacking together simple, re-usable and inexpensive parts. It turns out that it is not that easy – after 40 years component-based software engineering is only slowly being adapted. The latest trend is a new generation of dynamic component models supporting a service-oriented programming model. This includes OSGi and its extensions. In these systems, components are not assembled manually by software engineers at design time but automatically by component containers at runtime. Automated assembly is based on rich component meta data and constraint resolution. The initial success of these technology is impressive, and some of the largest and most complex systems such as IBM WebSphere, Oracle WebLogic and the Java Development Kit (JDK) either have been or are currently being refactored to take advantage of these new technologies.

We are interested in two questions related to these new generation component models: firstly, can existing composition techniques ensure the correctness of assemblies? Secondly, is it possible to automate the modularisation of monolithic legacy systems? We present several studies investigating these two questions.

In the first experiment, we have investigated component contracts in the (OSGi-based) Eclipse ecosystem. It turns out that verification fails for a significant number of contracts, violating some of the Eclipse (social) community rules. In the second experiment, we have analysed a large set of real world programs for occurrences of certain antipatterns that present barriers to modularisation and have developed a novel algorithm based on edge scoring to detect dependencies in systems that compromise modularity. The preliminary results are promising: the algorithm we have developed can detect a small number of basic refactorings that can remove the majority of antipattern instances, confirming that the Pareto principle (aka "80-20 rule") applies here. This result is obtained by running an experiment on the Qualitas Corpus data set.

We will also briefly discuss some tools and libraries we have developed for this purpose, including the Massey Architectural Explorer.


Modeling deviations from expected behaviour

Luis Torgo
Department of Computer Science, University of Porto, Portugal
Tuesday 8 May 2012
In this talk I will present two concrete real-world applications where the use of data mining has the goal of modeling deviations from unexpected behavior. The first application has to do with fraud detection. Data mining methods highlight probable cases of fraud for manual inspection. These inspections are usually constrained by limited human and/or financial resources. Our solution uses utility theories borrowed from economics together with outlier ranking methods to incorporate these constraints by integrating cost/benefit estimates with the probability of a case being fraudulent. The second application examines the task of monitoring and forecasting water quality parameters on a large distribution network. It involves anticipating the evolution of a time series and deciding when alarms should be issued due to unexpected behavior of the series.


Alan Turing and the Computing Engine Turing's achievements in practical computing

Bob Doran
Computer Science Department, Auckland University
Tuesday 1 May 2012
In mid 1945 Turing was recruited by the British National Physical Laboratory to work on the design of what was intended to be Britain's first computer, the ACE. Turing produced a written proposal that was approved to proceed in early 1946. For two years Turing worked on the design of ACE and its software. However, the ACE project did not progress as expected so, after a year at Cambridge University, Turing moved to Manchester University where he was responsible for the initial development of software for the world's first operating computer. The NPL Pilot ACE, the initial implementation of Turing's designs, was finally operational in 1950 and led to the very successful English Electric DEUCE computers. Although we now acknowledge that Turing made great contributions to practical computing, this was not widely acknowledged from the time of his early death (1954) until the 1980s. As well as summarizing Turing's practical accomplishments we will look into the circumstances of how and why he was so long marginalized in the accepted history of practical computing.


Resource-centred storage of RDF graphs: clustering triples for efficient retrieval

Ralf Heese
Humboldt University and Freie Universitaet Berlin
Monday 23 April 2012
This presentation introduces an efficient storage and query model for semantic data stored as RDF triples. We discuss details of our cluster method for efficient RDF data access, its implementation and performance gains achieved. The RDF data model is a semantic model for management and exchange data in applications. Data is represented as triples (with details of resource, property, and value). These triples form a graph structure and, therefore, relational databases are not suitable for efficient management and retrieval of RDF graphs. Alternative native repositories mostly use B-trees to index triples and therefore have to execute expensive join calculations for each query. We use a storage model that clusters triples according to their resources, e.g., manages them as a collection of star-like subgraphs. Since queries can be decomposed into the same star-like patterns, we argue that our storage model has advantages over existing approaches, e.g., the query engine has to perform less join operations than in existing approaches. Initial results of our performance evaluations confirm that our cluster model provides efficient access to RDF triples.


Adaptive scheduling on power-aware managed data-centers using machine learning

Ricard Gavaldà
Universitat Politècnica de Barcelona (UPC BarcelonaTech)
Tuesday 20 March 2012
Energy-related costs have become one of the major economic factors in data-centers management, and companies and the research community are currently working on efficient power-aware resource management strategies as a part of the collective effort called "Green IT”. We propose a framework for autonomic scheduling of tasks and web-services on cloud environments, optimizing the profit by balancing revenue from task execution minus penalties for service-level agreement violations, minus power consumption cost. The main contribution is the combination of consolidation and virtualization technologies, mathematical optimization methods, and machine learning techniques. We focus on webservice-like tasks, because of their variability over time makes them particularly challenging. Our system uses machine learning techniques to predict the performance and consumption of hosts on tasks, uses these predictions to set up an optimization program describing the allocation to tasks to hosts with highest utilities. This process is performed periodically as tasks change their behavior or as new tasks arrives. At a higher level, the talks wants to highlight the potential for machine learning and data mining in the fields of Autonomic and Green Computing. This is joint work with Josep Ll. Berral and J. Torres (UPC and Barcelona Supercomputing Center) and was presented at the Grid 2011 conferences.


In the edge of optimization and learning for resource allocation

Beatriz López
Department of Electrical, Electronics and Automatic Engineering, University of Girona, Catalonia, Spain
Friday 16 March 2012
Optimization and learning are two complementary research disciplines that need to each other to solve real world problems. In this talk, I will review the research done in the eXiT group for solving resource allocation problems in three different domains: bandwidth communication, workflow management systems (health care), bike public transport. Auctions are used in decentralized environments, while classical heuristic approaches are followed for determining the winner bid. Clustering, sequence learning and reinforcement learning are the mainstay for optimization feasibility. Finally, I will introduce the current challenges we are facing, including the hybridization of learning techniques with case-based reasoning.


Fit-for-purpose complex systems simulation

Fiona Polack
Department of Computer Science, University of York, U.K.
Tuesday 13 March 2012
How do you create an agent-based simulation that a cancer researcher or immunologist can understand and use? What does simulation engineering have in common with the study of large-scale complex IT systems? At York (UK), research into principled approaches to modelling and simulating complex systems (CoSMoS) has addressed natural (e.g. biological and biomedical systems) and man-made (e.g. flocking algorithms, swarm robot) domains in which multiple agents interact to form complex systems. Working in collaboration with laboratory scientists, we have shown that the CoSMoS process can be used to develop simulations usable in guiding hypothesis generation and laboratory experimentation. In the talk, I will summarise some of the engineering challenges that arise in the engineering of simulations, and which are challenges in the wider scope of model-driven engineering. I will consider how conventional software engineering techniques, including some from critical systems engineering, can be used create a simulation that is demonstrably fit for its scientific purpose. The talk will focus on a feasibility study that has modelled prostate cell division and differentiation, the first phase of a simulation project that will be used in study of cancer neogenesis and benign prostatic hyperplasia.


New algorithms for graphs and small molecules exploiting local structural graph neighborhoods and target label dependencies

Stefan Kramer
Johannes Gutenberg University Mainz
Tuesday 6 March 2012
In the talk, I will present recently developed algorithms for predicting properties of graphs and small molecules: In the first part of the talk, I will present several methods exploiting local structural graph (similarity) neighborhoods: local models based on structural graph clusters, locally weighted learning, and the structural cluster kernel. In the second part, I will discuss methods that exploit label dependencies to improve the prediction of a large number of target labels, where the labels can be just binary (multi-label classification) or can again have a feature vector attached. The methods make use of Boolean matrix factorization (BMF) and can be used to predict the effect of small molecules on biological systems.


Crowd Sourcing of OCR using Duolingual

Puakea Noglemeier
University of Hawaii in Mānoa
Tuesday 7 February 2012
Puakea will discuss their current newspaper initiative, 'Ike Ku'oko'a (www.awaiaulu.org). They propose, through totally volunteer effort, to typescript 60,000 pages of newspaper in 8 months. The first 15,000 pages took 10 years, but it 'rocked the world' in Hawaii, for content and for opening up a new historical vision. This next step brings it all to the table, and engages a small army of collaborators. Ambitious, with lots of potential. The work was kicked off a month ago and have over 2,000 volunteers in Hawai'i and around the world, including N.Z.

A discussion will follow about research undertaken by a team led by Luis von Ahn who's work involves using crowd sourcing for OCR and language learning. Their latest project called Duolingo has the potential to assist projects like 'Ike Ku'oko'a but how does engagement occur and what potential could be realised?


Image Data Analysis in H.sapiens and C.elegans

Alexander K. Seewald
Independent Researcher/Consultant for Data Mining
Thursday 12 January 2012
In the past three years, we have collaborated with the University of Colorado, Boulder, USA, and the Institute for Medical Pathology at the Medical University, Vienna. The focus of our research was on developing image analysis systems using state-of-the-art techniques, including machine learning techniques which have been previously used to recognize faces. We have developed robust preprocessing methods (stitching, illumination correction, erythrocyte removal), segmentation algorithms and task-specific evaluation algorithms based on ground-truth data provided by our biological researcher partners. We have analyzed three different types of tissue: human osteoclast in culture, human placental tissue, and living transgenic C. elegans specimen.

We will close with thoughts on the potential uses of image data analysis, what is and is not yet feasible, and how biological researchers should be expected to help in the building of state-of-the-art systems.


Events Index