Talks, presentations, videos

In June 2009 I gave a series of five non-technical lectures at the University of Siena, aimed at PhD students in the Humanities. Each talk below is a book; click it -- and click again to open it. (Some pages include animations that are revealed by the down-arrow key.)

The codex or book form is one of humankind's most wonderful inventions. But in our obsession with technology we are in danger of losing it. Web pages are like early papyrus scrolls. Scrolling page-by-page PDF readers resemble the "concertina" format that superseded scrolls -- until the codex came along. What about electronic "realistic books"?

This talk surveys the development of the book format and its electronic parallels, and explains how realistic books work and what they can do. Realistic books can embrace many advantages of electronic documents: electronic tables of contents and chapter tabs, internal and external hyperlinks, and multimedia.

Search engines -- "web dragons" -- are the portals through which we access society's treasure trove of information. How do they work? How can web visibility be exploited by those who want to sell us their wares? How do commercial interests play against society's need for neutral and thoughtful evaluation of knowledge?

What could be more important than how our society deals with recorded knowledge? This talk takes a critical look at how the dragons work and the role they play in today's society. We touch on social issues such as web spam, privacy, and the difference between the dragons' motivation and ours.

Wikipedia represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations.

This talk focuses on the process of "wikification"; that is, automatically and judiciously augmenting a plain-text document with pertinent hyperlinks to Wikipedia articles -- as though the document were itself a Wikipedia article. It first describes how Wikipedia can be used to determine semantic relatedness between concepts. Then it explains how to wikify documents by exploiting Wikipedia's internal hyperlinks for relational information and their anchor texts as lexical information.

Traditional publishing and distribution mechanisms have tragically failed the developing world. By decoupling production and distribution costs from intellectual property charges, digital libraries offer a sorely needed lifeline.

The Greenstone Software has helped spread the practical impact of digital libraries, particularly in developing countries. Many lessons have been learned in developing and deploying a comprehensive open-source system for an international user base. The most difficult challenges have been political, educational, and sociological, echoing that old programmers' blessing "may all your problems be technical ones."

Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. The idea is to build computer programs that sift through databases using so-called "machine learning" algorithms to seek regularities or patterns. Strong patterns, if found, will likely generalize to make accurate predictions on future data.

As with any burgeoning technology that enjoys commercial attention, the use of data mining is surrounded by a great deal of hype. But there is no magic in machine learning, no hidden power, no alchemy. Instead there is an identifiable body of practical techniques that can extract useful information from raw data.

The goal of the FLAX (Flexible Language Acquisition) project is to produce software to automate the production and delivery of practice exercises for overseas students who are learning English.

Here's a tutorial on how to use FLAX. The pace is slow and deliberate because it's designed for people who are not native English speakers.

FLAX tutorial (99 MB) (40 min video)

Here's a tutorial, based on the book Data Mining, that I recorded in June 2008.

Data Mining Algorithms
    Part 1
    Part 2

In March 2006 Tim Bell of the University of Canterbury interviewed me about digital libraries and real books.

Digital libraries and real books (25 min podcast)

Here's a lecture I gave at the University of Lethbridge, Canada, in September 2003.

Browsing around a digital library
    [high bandwidth version]

I give a course at Waikato entitled Web search: Technical and Social Issues, based on the book Web Dragons. As an assignment, students write a screenplay for a short movie that illustrates a related theme. In a separate course, some Computer Graphics Design students made these screenplays into video. The result can be seen on YouTube's TheWebDragons channel.

The Almighty Google
A Fistful Of Data
The Internet Mafia
Invasions from the Net
The Last Library **
The University of the Future
When Zombies Attack
Without AdRoof, can I be Picasso? **
The World Wide Web