|
Department of
|
|
Modeling deviations from expected behaviourLuis TorgoDepartment of Computer Science, University of Porto, Portugal |
Tuesday 8 May 2012 |
| In this talk I will present two concrete real-world applications where the use of data mining has the goal of modeling deviations from unexpected behavior. The first application has to do with fraud detection. Data mining methods highlight probable cases of fraud for manual inspection. These inspections are usually constrained by limited human and/or financial resources. Our solution uses utility theories borrowed from economics together with outlier ranking methods to incorporate these constraints by integrating cost/benefit estimates with the probability of a case being fraudulent. The second application examines the task of monitoring and forecasting water quality parameters on a large distribution network. It involves anticipating the evolution of a time series and deciding when alarms should be issued due to unexpected behavior of the series.
|
|
Alan Turing and the Computing Engine Turing's achievements in practical computingBob DoranComputer Science Department, Auckland University |
Tuesday 1 May 2012 |
| In mid 1945 Turing was recruited by the British National Physical Laboratory to work on the design of what was intended to be Britain's first computer, the ACE. Turing produced a written proposal that was approved to proceed in early 1946. For two years Turing worked on the design of ACE and its software. However, the ACE project did not progress as expected so, after a year at Cambridge University, Turing moved to Manchester University where he was responsible for the initial development of software for the world's first operating computer. The NPL Pilot ACE, the initial implementation of Turing's designs, was finally operational in 1950 and led to the very successful English Electric DEUCE computers. Although we now acknowledge that Turing made great contributions to practical computing, this was not widely acknowledged from the time of his early death (1954) until the 1980s. As well as summarizing Turing's practical accomplishments we will look into the circumstances of how and why he was so long marginalized in the accepted history of practical computing.
|
|
Resource-centred storage of RDF graphs: clustering triples for efficient retrievalRalf HeeseHumboldt University and Freie Universitaet Berlin |
Monday 23 April 2012 |
| This presentation introduces an efficient storage and query model for semantic data stored as RDF triples. We discuss details of our cluster method for efficient RDF data access, its implementation and performance gains achieved. The RDF data model is a semantic model for management and exchange data in applications. Data is represented as triples (with details of resource, property, and value). These triples form a graph structure and, therefore, relational databases are not suitable for efficient management and retrieval of RDF graphs. Alternative native repositories mostly use B-trees to index triples and therefore have to execute expensive join calculations for each query. We use a storage model that clusters triples according to their resources, e.g., manages them as a collection of star-like subgraphs. Since queries can be decomposed into the same star-like patterns, we argue that our storage model has advantages over existing approaches, e.g., the query engine has to perform less join operations than in existing approaches. Initial results of our performance evaluations confirm that our cluster model provides efficient access to RDF triples.
|
|
Adaptive scheduling on power-aware managed data-centers using machine learningRicard GavaldàUniversitat Politècnica de Barcelona (UPC BarcelonaTech) |
Tuesday 20 March 2012 |
| Energy-related costs have become one of the major economic factors in data-centers management, and companies and the research community are currently working on efficient power-aware resource management strategies as a part of the collective effort called "Green IT”. We propose a framework for autonomic scheduling of tasks and web-services on cloud environments, optimizing the profit by balancing revenue from task execution minus penalties for service-level agreement violations, minus power consumption cost. The main contribution is the combination of consolidation and virtualization technologies, mathematical optimization methods, and machine learning techniques. We focus on webservice-like tasks, because of their variability over time makes them particularly challenging. Our system uses machine learning techniques to predict the performance and consumption of hosts on tasks, uses these predictions to set up an optimization program describing the allocation to tasks to hosts with highest utilities. This process is performed periodically as tasks change their behavior or as new tasks arrives. At a higher level, the talks wants to highlight the potential for machine learning and data mining in the fields of Autonomic and Green Computing. This is joint work with Josep Ll. Berral and J. Torres (UPC and Barcelona Supercomputing Center) and was presented at the Grid 2011 conferences.
|
|
In the edge of optimization and learning for resource allocationBeatriz LópezDepartment of Electrical, Electronics and Automatic Engineering, University of Girona, Catalonia, Spain |
Friday 16 March 2012 |
| Optimization and learning are two complementary research disciplines that need to each other to solve real world problems. In this talk, I will review the research done in the eXiT group for solving resource allocation problems in three different domains: bandwidth communication, workflow management systems (health care), bike public transport. Auctions are used in decentralized environments, while classical heuristic approaches are followed for determining the winner bid. Clustering, sequence learning and reinforcement learning are the mainstay for optimization feasibility. Finally, I will introduce the current challenges we are facing, including the hybridization of learning techniques with case-based reasoning.
|
|
Fit-for-purpose complex systems simulationFiona PolackDepartment of Computer Science, University of York, U.K. |
Tuesday 13 March 2012 |
| How do you create an agent-based simulation that a cancer researcher or immunologist can understand and use? What does simulation engineering have in common with the study of large-scale complex IT systems? At York (UK), research into principled approaches to modelling and simulating complex systems (CoSMoS) has addressed natural (e.g. biological and biomedical systems) and man-made (e.g. flocking algorithms, swarm robot) domains in which multiple agents interact to form complex systems. Working in collaboration with laboratory scientists, we have shown that the CoSMoS process can be used to develop simulations usable in guiding hypothesis generation and laboratory experimentation. In the talk, I will summarise some of the engineering challenges that arise in the engineering of simulations, and which are challenges in the wider scope of model-driven engineering. I will consider how conventional software engineering techniques, including some from critical systems engineering, can be used create a simulation that is demonstrably fit for its scientific purpose. The talk will focus on a feasibility study that has modelled prostate cell division and differentiation, the first phase of a simulation project that will be used in study of cancer neogenesis and benign prostatic hyperplasia.
|
|
New algorithms for graphs and small molecules exploiting local structural graph neighborhoods and target label dependenciesStefan KramerJohannes Gutenberg University Mainz |
Tuesday 6 March 2012 |
| In the talk, I will present recently developed algorithms for predicting properties of graphs and small molecules: In the first part of the talk, I will present several methods exploiting local structural graph (similarity) neighborhoods: local models based on structural graph clusters, locally weighted learning, and the structural cluster kernel. In the second part, I will discuss methods that exploit label dependencies to improve the prediction of a large number of target labels, where the labels can be just binary (multi-label classification) or can again have a feature vector attached. The methods make use of Boolean matrix factorization (BMF) and can be used to predict the effect of small molecules on biological systems.
|
|
Crowd Sourcing of OCR using DuolingualPuakea NoglemeierUniversity of Hawaii in Mānoa |
Tuesday 7 February 2012 |
| Puakea will discuss their current newspaper initiative, 'Ike Ku'oko'a (www.awaiaulu.org). They propose, through totally volunteer effort, to typescript 60,000 pages of newspaper in 8 months. The first 15,000 pages took 10 years, but it 'rocked the world' in Hawaii, for content and for opening up a new historical vision. This next step brings it all to the table, and engages a small army of collaborators. Ambitious, with lots of potential. The work was kicked off a month ago and have over 2,000 volunteers in Hawai'i and around the world, including N.Z.
A discussion will follow about research undertaken by a team led by Luis von Ahn who's work involves using crowd sourcing for OCR and language learning. Their latest project called Duolingo has the potential to assist projects like 'Ike Ku'oko'a but how does engagement occur and what potential could be realised?
|
|
Image Data Analysis in H.sapiens and C.elegansAlexander K. SeewaldIndependent Researcher/Consultant for Data Mining |
Thursday 12 January 2012 |
| In the past three years, we have collaborated with the University of Colorado, Boulder, USA, and the Institute for Medical Pathology at the Medical University, Vienna. The focus of our research was on developing image analysis systems using state-of-the-art techniques, including machine learning techniques which have been previously used to recognize faces. We have developed robust preprocessing methods (stitching, illumination correction, erythrocyte removal), segmentation algorithms and task-specific evaluation algorithms based on ground-truth data provided by our biological researcher partners. We have analyzed three different types of tissue: human osteoclast in culture, human placental tissue, and living transgenic C. elegans specimen.
We will close with thoughts on the potential uses of image data analysis, what is and is not yet feasible, and how biological researchers should be expected to help in the building of state-of-the-art systems.
|
|