Mā te hangarau rorohiko te reo Māori e whakaora ai? Does computer technology have a role in a Māori language regeneration strategy?

Paora Mato
The University of Waikato, Hamilton
Monday 12 December 2011
Te reo Māori is one of many at-risk indigenous languages. Various initiatives over the past 40 years have sought to halt the decline of te reo Māori and increase the number of fluent speakers. More recently these initiatives have included translated interfaces for a selection of computer applications. This immediately locates access to aspects of the language on a far broader scale. Global availability becomes even more significant where those translated applications are continuously accessible via the internet. In terms of language survival then, one would expect that the ability to use software interfaces, available in at-risk languages, would be advantageous in some manner. Intuitively, the existence of such tools in a global, unbridled environment would offer a variety of options to a language strategy aimed at survival and regeneration. This research proposes to identify whether or not translated interfaces provide such options and what role, if any, computer technology should have within language regeneration strategies. Additional analysis will describe how that role fits within language regeneration and the necessary shifts in perception, awareness, and engagement that are required in order to normalise the use of translated interfaces and, consequently, the language itself.


The Challenge of using Telemedicine (Human-Computer Interaction) to Face Emerging Infectious Diseases in the Tri-National South-Western Amazon Region

Manuel Cesario
University of Franca, Brazil
Tuesday 12 July 2011
This seminar presents the interactions between socio-environmental changes and the (re)emergence of infectious diseases in South-western Amazonia. In this region, burning is used to convert large swathes of forest into pastures and plantations, emitting greenhouse gases. Climate change models forecast a regional decrease in humidity and increase in temperature - conditions that will favour a perverse circle, with more fires. The on-going building of hydroelectric dams and hydro-ways, as well of recent road-paving will also change the regional eco-epidemiology. These changes will alter the distribution of vectors (mosquitoes and other arthropods), and there will likely be increased vector/human contact, by increasing the density of people and vectors and also the rate of migration. Changes in the epidemiology of vector-borne diseases are already being observed, including two re-emerging diseases transmitted by sand flies: Cutaneous Leishmaniasis and Bartonellosis (Carrion Disease). The first is endemic at the three sides of the South-western Amazon frontier, while the latter is expanding its transmission in Peru and reaching the borders with Bolivia and Brazil, countries where medical professionals do not have the necessary expertise to diagnose and treat this lethal disease.

There is a challenge of developing telemedicine tools to improve the early diagnose of diseases which are not familiar to health professionals in Brazil and Bolivia, such as Bartonellosis. These Human-computing Interaction (HCI) tools may integrate the Early Warning Systems for (Re)Emerging Infectious Diseases being developed to improve the capacity of health workers to anticipate and respond to the regional socio-environmental threats. Better understanding of the actual and potential adverse effects of regional infrastructure development may also improve policy decisions, minimising human impacts and vulnerability - concerns of the Human Dimensions community – and can, ultimately, be considered as adaptation strategies to reduce negative impacts of global environmental change on human health.


Concept-based text clustering

Anna Huang
University of Waikato
Tuesday 28 June 2011
Many processes in information retrieval and text mining represent documents in terms of the words they contain and the frequencies of these words: this is the traditional "bag-of-words" model. However, it is semantically ambiguous, because many words have multiple meanings, and undesirably orthogonal, because the semantic associations between words are ignored. Consequently, recent research has investigated how to use concepts instead of words as the document representation, with application to tasks such as information retrieval, text categorization and clustering. This seminar will show how to employ two different concept systems, WordNet and Wikipedia, to improve document representation for the purpose of clustering documents. We first discuss how to represent texts by concepts rather than words, then we enrich the clustering process by taking account of semantic relations between concepts, and finally we show how a more accurate document similarity measure can be learned. The result is a significant improvement in the performance of two existing standard clustering algorithms, and a document similarity measure that is more consistent with human judgment than people are themselves (yes!).


Machine learning in the wild: some successes and failures

Brent Martin
University of Canterbury, Christchurch
Monday 20 June 2011
The machine learning community has a reputation for pronouncing the problem of learning from data "solved", only to emerge red-faced some years later. A relatively recent "silver bullet" is boosted trees, and in 1996 AdaBoost was pronounced "the best off-the-shelf classifier in the world". Boosted trees are powerful and easily applied, but like all ML methods they need to be used with care. In this seminar we examine the use of boosted trees and other methods to real world data problems and try to learn some lessons about their application. We will also revisit the role of human intuition in data mining, using the 2009 KDD cup winners as a case study.


Localization Provision in NZ: Arabic Speakers Preference on Different Paralingual Website Layouts

Fouad Shiblaq
Department of Computer Science, University of Waikato
Tuesday 14 June 2011

The aim of this research is to investigate Arabic speakers’ preference on paralingual webpage layout in the Arabic and English languages. This research will contribute to knowledge by classifying the preference of Arabic speakers on paralingual webpage layout; consequently, this research’s framework can be applicable to any particular ethnic group in order to specify their preference on paralingual webpage layout in their own language.Thus, the aim of this research is to answer the following research questions:

Q1. What obstacles prevent Arabic Speakers migrants from being online and having access to e-government in NZ?

Q2. What paralingual layout do the majority of Arabic Speakers migrants prefer when browsing e-government in NZ?

Q3. What are the benefits of a paralingual e-government website in NZ for Arabic Speakers?


Knowledge-based weak supervision for scalable information extraction

Dan Weld
University of Washington, USA
Tuesday 17 May 2011
We dream of using Information extraction (IE) to create large-scale knowledge bases from natural language text on the Web. However, the primary approach (supervised learning of relation-specific extractors) requires manually-labeled training data for each relation and doesn’t scale to the thousands of relations encoded in Web text. In this talk I'll summarize five years of experience using distant supervision (on sources ranging from Wikipedia and Freebase to the Web and New York Times) to build robust extraction systems. I'll talk about heuristic matching, ontology construction, smoothing, inference, and a fast learning algorithm for a novel graphical model which allows partial matches on overlapping relations.


The Bicultural Digitisation and Service Priorities (BDSP) project

Sharon Jensen
National Library of New Zealand, Wellington
Wednesday 11 May 2011
The National Library’s Digitisation Strategy 2010-2015 identifies material that is of specific importance to Māori, or that supports Te Tiriti O Waitangi responsiveness, as being priority areas for digitisation. The Bicultural Digitisation and Service Priorities (BDSP) project is part of a programme of consultation with Māori. This project aims to help identify and prioritise material of special importance to Māori for digitisation, and to guide the development of new-generation services for Māori.

To implement a programme of digitisation of material of importance to Māori, the Library first needs to understand the needs and priorities of Māori customers. These and other customers will assist in guiding the development of a robust set of principles that incorporate important considerations such as the classification, identification and prioritisation of taonga to be digitised. This will allow the project to prepare a report on known material in the National Library and Alexander Turnbull Library collections that is a high priority for digitisation, and a set of guidelines for prioritising Māori material due for completion by the end of June.

Waikato University has developed digitisation programmes and is for this reason the BDSP project team would appreciate any assistance you can offer for our kaupapa to help us determine prioritising Māori taonga.


Towards a systemic and multiview approach for data analysis and data mining

Jean-Charles Lamirel
University of Strasbourg, INRIA Talaris Project, Loria, Nancy France
Tuesday 19 April 2011
The main topic of our talk will relate to the extension of the systemic approach, initially established in the Information Retrieval System NOMAD, which was the subject of our PhD work, to set up a new general paradigm of data analysis based on multiple viewpoints. This paradigm, named Multi-Viewpoint Data Analysis (MVDA), covers at the same time the field of data analysis and that of data mining. According to this one, each data analysis can be regarded as a different view on the data. A communication process between the views can take place via a Bayesian network built up in a unsupervised way using the data or the features that are shared between those views. The MDVA paradigm also relies on the exploitation of specific methods of visualization, like the topographic visualization or the hyperbolic visualization. The setting up of new quality indexes of unsupervised Recall/Precision based on the analysis of the feature distribution of the clusters related data, which are at the same time are independent of the clustering methods and of the changes relative to their operating mode (initialization, type of distances, ?), allowed us to objectively demonstrate the superiority of the MVDA paradigm as compared to the usual global data analysis approach. It also has allowed us to compare and to integrate in this paradigm different unsupervised neural clustering methods, which are more particularly adapted to the management of ultra-sparse and highly multidimensional data, like the documentary data. In addition, our approach led us to develop the cohabitation between the numerical reasoning and the symbolic reasoning, so as to cover the whole of the functions of data analysis and data mining and to reduce the defects inherent in each type of reasoning. Through many applications, in particular in the fields of scientometrics and webometrics, we will show how the exploitation of the MVDA paradigm can make it possible to solve very complex problems of data analysis, as those related to the diachronic analysis of large datasets of polythematic textual data. We will also show how the whole of the tools developed within the framework of this paradigm have enabled us to set up new robust and powerful methods of supervised classification and of incremental clustering. We will finally show how we plan to extend its application to very challenging fields, like that of bioinformatics.


Tools for the Semantic Web

Ralf Heese and Adrian Paschke
Freie Universität Berlin, Germany
Monday 21 March 2011
Corporate Semantic Web addresses the application of semantic technologies in a corporate context. In this talk we give a short overview on tools and applications developed by the research group on Corporate Semantic Web at the Freie Universitaet Berlin. The group developed an eXtreme Tagging System to collaboratively develop initial semantic networks (ontologies) by extracting knowledge paths in tag clouds created by domain experts. Leonè is a light-weight ontology editor to non-experts and loomp provides an environment in which non-expert users can create semantically annotated texts.

The group works with German companies to provide semantic enrichment of data. They developed a semantic search engine for the Berliner Museumsportal, the web portal of all museums in Berlin. In another project, structured information is extracted from Wikipedia (Germany) to make it available for machine access on the Linked Data Web. To reuse an (unknown) ontology, we need to understand its purpose and to find its key concepts. Understanding the structure of an ontology may also be useful to break it apart into smaller modules which can be managed separately.


Digital, physical, interactive, human: Tales from an academic library

Dana McKay
Swinburne University of Technology Library, Melbourne
Monday 14 March 2011
Libraries, and academic libraries in particular are complex sites of human interaction with information. The information these libraries provide includes books, video, audio and articles, and it is provided through a variety of media and systems. In then end, though, if library users can't find and access information, be it physical or digital, the information isn't useful. In this talk I will discuss three studies of information seeking and use done in an academic library: one about how researchers manage their publication identities, and search for work by specific people, the second about how library users get lost looking for material on the shelves, and the final one about how users search from the library homepage. With these three examples I hope to demonstrate some of the information and interaction problems facing libraries, and provide scope for library-based research.


How to abuse a decision tree

Hendrik Blockeel
Katholieke Universiteit Leuven, Belgium
Tuesday 1 March 2011
Decision trees have been around (and very popular) since the eighties, mostly in the guise of classification and regression trees. In this talk, I will present a number of less obvious uses of decision trees. I will discuss how decision tree learners can be adapted towards learning hierarchical clusterings, learning trees that predict multiple targets at the same time, learning multi-label classifiers, learning multi-instance classifiers, and learning phylogenetic trees. Most of these alternative uses actually require only small changes to the decision tree learning method, and empirical results are often obtained that are surprisingly close to those of more specialized methods. This underlines the versatility of decision tree learning as a machine learning approach, and shows that taking a non-typical view on classic methods sometimes pays off.


Experiment Databases: A new way to share, organize and learn from experiments

Joaquin Vanschoren
Leiden Institute of Advanced Computer Science, Leiden University, The Netherlands
Thursday 24 February 2011
Thousands of machine learning research papers contain extensive experimental comparisons. However, the details of those experiments are often lost after publication, making it impossible to reuse these experiments in further research, or reproduce them to verify the claims made. In this paper, we present a collaboration framework designed to easily share machine learning experiments with the community, and automatically organize them in public databases. This enables immediate reuse of experiments for subsequent, possibly much broader investigation and offers faster and more thorough analysis based on a large set of varied results. We describe how we designed such an experiment database, currently holding over 650,000 classification experiments, and demonstrate its use by answering a wide range of interesting research questions and by verifying a number of recent studies.


Machine Learning Challenges in Ecological Science and Ecosystem Management

Tom Dietterich
School of Electrical Engineering and Computer Science, Oregon State University, USA
Wednesday 9 February 2011
Just as machine learning has played a huge role in genomics, there are many problems in ecological science and ecosystem management that could be transformed by machine learning. This talk will give an overview of several research projects at Oregon State University in this area and discuss the novel machine learning problems that arise. These include (a) automated data cleaning and anomaly detection in sensor data streams, (b) automated classification of images of arthropod specimens, (c) species distribution modeling including modeling of bird migration from citizen science data, and (d) design of optimal policies for managing wildfires in forest ecosystems. The machine learning challenges include flexible anomaly detection for multiple data streams, trainable high-precision object recognition systems, explicit models of sampling bias and measurement processes, and optimization of complex spatio-temporal Markov processes.


Language Planning in Wales: Lessons for New Zealand?

Jeremy Evas
Welsh Language Board, Cardiff, Wales
Friday 4 February 2011
This talk describes the situation of the Welsh language in Wales, and the structures that in place for promotion of the Language. Welsh, until 2001, followed a pattern similar to many smaller languages of the world, i.e. a sharp decline in number and proportions of speakers throughout the 20th century. However, due to civic pressure, the education system, and government intervention, the number and percentage of Welsh speakers is now rising, although top-level figures can hide many internal changes in the composition of a speech community. The structures for the promotion of Maori are currently being examined and there is an unparalleled opportunity to contribute to a strategy that will drive future language policy in New Zealand. This talk will therefore examine some possible suggestions for future promotion of Maori, based on practical experience in Wales.


