|
Introduction
Design Goals
Current Status
Example Programs...
People
Publications
Overview
Front-End Projects
Back-End Projects
Applications
|
JStar Research Group
Introduction
"Since real world applications are naturally parallel and
hardware is naturally parallel, what we need is a programming model,
system software, and a supporting architecture that are naturally
parallel. Researchers have the rare opportunity to re-invent these
cornerstones of computing, provided they simplify the efficient
programming of highly parallel systems."
From The
Landscape of Parallel Computing Research: A View from Berkeley,
Technical Report UCB/EECS-2006-183, University of California, Berkeley, Dec
2006.
Since 2005, computers are no longer getting faster, instead they
are becoming exponentially more parallel.
But parallel programming is currently difficult and costly,
because the widely-used programming languages of
today were not designed for parallel computers,
We aim to make it easy to write high-performance parallel programs
that can be retargeted to a wide variety of computer
architectures. We are developing a new style of parallel programming,
which separates the program logic from the parallelism details, and
enables a program to be moved easily from one computer architecture
to another.
In an earlier Marsden-funded project, we developed the semantics
of this programming style, implemented a prototype language called
Starlog,
and demonstrated the automatic selection of efficient data
structures, with performance within a factor of two of hand-optimized
programs in most cases. (See PhD thesis
by Roger Clayton). Now we are developing a new language called
JStar, which applies these ideas to parallel programming.
Design Goals for JStar
Our overall goal is to change the way the world programs!
Instead of writing sequential programs, with thousands of
semicolons that tell the compiler to do things more
sequentially than is really necessary, JStar programs contain
hundreds of rules that can all execute in parallel.
The causality relationships between these rules (when
you need to ensure that one rule finishes before another
one starts) are expressed using 'abstract timestamps'.
The fewer causality constraints you add, the more parallel
your program is. This approach makes it much easier for
compilers to generate aggressively parallel code that
is tailored to a particular machine architecture.
Instead of using complex data structures, we will use
relations, and let the compiler choose a data structure
for each relation that will give good performance.
This means that when we change our program to use a
relation in a slightly different way, the compiler can
automatically (or with some hints from the user) choose
a new data structure that efficiently supports that
kind of query or update.
Here is a brief summary of our design goals for JStar:
- As Declarative as Possible (to make transformation/proof easy)
- As Much Implicit Concurrency as Possible (retargetable to many
architectures)
- Keep the parallelism hints/commands separate from the original
JStar program, so that one program can have several alternative
parallel implementations without changing its source code.
- As Java/Scala/C#-like as Possible (to reduce takeup barriers)
Some similar languages to JStar/Starlog
are the OverLog and
Dedalus languages from the Declarative Networking group at
Berkeley (these are distributed implementations of Datalog with
explicit time), and John McCarthy's Elephant
2000 (this 1992 language proposal had the ideas of a
time-stamped history, data-structure-free programming, and a
compiler that chooses data structures).
Current Status
We have defined the semantics of JStar, have designed a
Java/Scala-like
syntax, and have implemented several prototype interpeters
for it. These come directly from the semantics paper, and offer
three alternative evaluation orders (sequential, medium parallelism
and maximum parallelism). See our examples page for some small
example programs and their execution traces.
Our basic architecture is to translate JStar source programs into
an intermediate language (IL), perform lots of transformations
on the IL to optimize the code and parallelize it for a particular
kind of machine (eg. cluster or many-core), then translate the
resulting IL code into an efficient parallel program for the target
machine.
We are currently using Starlog notation as our IL, so that we can
use the transformations and code generator that Roger Clayton
developed for sequential machines. This automatically guesses
appropriate data structures for each data relation in the program,
and generates sequential Java code.
On the case studies side, we have done graphics programs, LEGO
robotics programs, games and calculation programs in the past (Starlog).
Simon Crosby has developed distributed implementations of some
JStar case studies (Primes and Life) and
evaluated their performance on our cluster computer, Symphony and Paul Beard
transformed those same case studies to run on a GPU using CUDA and OpenCL.
Come back to this site in a few months to
find out more...
People
The people associated with the JStar project include:
- AProf Mark
Utting
- Prof John
Cleary
- Roger Clayton, PhD Student 2001-2004, Data Structure Selection
- Simon Crosby, MSc Student 2008-2010, Cluster Implementation Techniques
- Paul Beard, COMP520 project 2009, CUDA Implementation Techniques
- Craig Berntsen, COMP520 project 2009, OpenGL Interface
- Jordan Schwab, COMP520 project 2009, LINQ-like syntax
- Feifei Lin, COMP591 project 2009, JStar/SWING applications
Example Programs
A number of simple example programs are available including source code in Starlog and JStar
and detailed execution traces using different scheduling algorithms.
Publications
Here are some of the most relevant papers about the Starlog
language, which has similiar semantics to JStar (though the
syntax of JStar is quite different).
See the Starlog
website for additional publications.
-
Datalog as a parallel general purpose programming language
, J. G. Cleary, M. Utting and R. Clayton, Working
paper, Dept. Computer Science, University of Waikato,
Hamilton, New Zealand, 06/2010.
Presents a semantics for JStar/Starlog, together with several
abstract interpreters for the semantics, and proofs that relate
these to traditional fixed-point semantics for locally stratified
datalog programs. This is an updated version of the report below with more
examples and some material about garbage collection and optimization omitted.
This has been submitted for publication.
- A
semantics and implementation of a causal logic programming
language, J. G. Cleary, M. Utting and R. Clayton, Working
paper, Dept. Computer Science, University of Waikato,
Hamilton, New Zealand, 01/2009.
Presents a semantics for JStar/Starlog, together with several
abstract interpreters for the semantics, and proofs that relate
these to traditional fixed-point semantics for locally stratified
datalog programs.
- Compilation of Bottom-Up
Evaluation for a Pure Logic Programming Language, PhD
Thesis, Roger Clayton, University of Waikato, Hamilton, New
Zealand, 2004.
Describes a compiler for Starlog
including a scheme to automatically assign data-structures to
relational tables using both static and run-time performance
information.
- Verification
of Starlog Programs by John Cleary and Mark Utting.
Presented at the Australasian Workshop on Computational Logic
(AWCL'2001), Bond University, Australia.
This presents a simple methodology for proving Starlog
programs correct with respect to predicate calculus
specifications. Example programs: generating all even
numbers, and a resource manager that grants a synchronized token
to one requestor at a time, in a fair fashion.
- Data
Structures considered Harmful by John Cleary, Mark Utting
and Roger Clayton. Presented at the Australasian Workshop on
Computational Logic (AWCL'2000), ANU, Canberra,
Australia.
This describes how Starlog programs avoid a premature
commitment to data structures. Example programs: an engine
simulation with start and stop commands, an assignment example,
calculating even numbers, calculating hamming numbers. The paper
also discusses the selection of efficient data structures for the
hamming number program.
-
Optimising Tabling Structures for Bottom-Up Logic
Programming, by Roger Clayton, John G. Cleary, Bernhard
Pfahringer, and Mark Utting, Preproceedings of the
International Workshop on Logic Based Program Development and
Transformations 2002, pp.57-75. (A two-page extended abstract
of this paper was also published in LNCS 2664).
This paper describes a Starlog compiler that generates Java code,
using SDSL (Starlog Data Structure Language) as an intermediate
stage, with automatic selection of data structures.
- Pseudo-Naive Evaluation.
Donald Smith and Mark Utting, Tenth Australasian Database Conference,
ADC'99, Auckland, NZ, Jan 1999.
Describes one possible evaluation strategy for Starlog.
- Declarative
Updates in Deductive Object-Oriented Databases, by Mengchi
Liu and John Cleary. International Journal of Microcomputer
Applications, 15(3), pages 127-142, 1996.
Describes a Starlog-like language with explicit database insert and delete
commands for putting objects into the database. Only uses integer timestamps.
- Updates in a
temporal logic programming language, by John G. Cleary and
Vinit N. Kaushik. Technical Report 1991-427-11, Department of
Computer Science, University of Calgary, Calgary, Alberta,
Canada.
Introduces Starlog and shows that it is
capable of directly expressing the mutation and change of
clauses within a database. Some short example programs are
described and used to illustrate an effective and efficient
bottom-up technique for executing the language based on
connection graphs, and an intelligent arithmetic constraint
system.
PhD and MSc Topics
There are lots of interesting research topics in the JStar project,
which could form the basis of PhD or MSc work. To illustrate some of
the possible projects, consider the following JStar System Architecture diagram.
This shows how one or more source syntaxes for JStar are translated by
a front-end program into the JStar IL (Intermediate Language),
where lots of analysis and optimization transformations are done.
Some of these transformations convert the program into an
architecture-dependent form (eg. spread the data structures over
many nodes of a cluster computer, or spread the rules over many cores
of a multi-core computer). After further transformations, this IL
can then be translated by a back-end program into some
suitable output syntax for that architecture (eg. Java for the
sequential implementation, MPI for clusters, CUDA for GPUs, and
perhaps Microsoft's ParallelFX library for many-core implementations).
We take a strong experimental approach to parallelization, so
the performance results from each run on the target machine are
used to improve and tune the transformations and to develop heuristics
for deciding automatically which transformations are best for each
kind of program.
We classify the following PhD/MSc topic suggestions into
front-end projects (mostly about language design and
evaluation), back-end projects (mostly about
parallelization transformations and performance evaluation) and
applications projects (using JStar and designing library
APIs and new programming styles that suit those APIs).
Front-End Projects
- Static Analysis: JStar programs have a number of
integrity and type constraints that are needed to ensure that the
causality orderings specified by the programmer are
consistent with the code. Checking these is a non-decidable
problem in general, and in Starlog we used some simple
fixed-point abstract interpretation algorithms and the automated
theorem prover
Simplify
to check the constraints.
This project will involve developing static analysis
algorithms to check the correctness of JStar programs and
to optimize them by deriving information about the types and
ranges of variables etc. This will probably be done with
the aid of modern SMT solvers such as
Yices and
Z3.
This project will require a good theory background and
will suit someone who is interested in type checking,
abstraction interpretation, and automated theorem proving.
- Meta-Compiler for JStar: we are moving towards writing
the next version of the JStar compiler as a declarative program
in JStar, similar to the
Evita
Raced ideas from Berkeley.
This project will be to develop transformations, query
optimizations and code generation strategies in this declarative
compilation framework.
- Example-Based JStar Programming Environment: Relations
are the core data structures that are used in all JStar
programs. This project will experiment with developing
interactive environments that allow relations to be defined
visually and then populated with example tuples, either
automatically (randomly-generated tuples, min-max tuples,
pairwise tuples, etc.) or manually (by typing in several
tuples). These tuples will then be used as test inputs for
each JStar rule that is defined, and the output of the rule
will be compared against the example tuples in the output
relation. The goal is to provide an interactive unit testing
environment for JStar.
- Correctness Proofs for JStar: For the future
(see the Verification of Starlog Programs paper
in the publications section
for some similar verification ideas for Starlog).
- Orderings:
Every JStar (or Starlog) program has an associated ordering
on the tuples in its tables. This project will investigate ways
of specifying these orderings, verifying their correctness,
and automating their extraction from programs.
Back-End Projects
- Parallelization for Many-Core Computers: Our first
parallel back-end will be aimed at many-core computers.
The plan is to generate Java code that uses thread pools,
Future<T> objects, parallel iterators,
concurrent data structures, etc. The goal is to create
lots of tasks at a good level of granularity.
For performance evaluation experiments, we have access
to several 8-core hyperthreaded machine (16 threads).
- Parallelization for GPUs: This project will
research transformations for using GPUs (Graphics Processing
Units) as the execution engine for JStar. Modern GPUs have
dozens of simple cores and can handle thousands of
concurrent threads with almost zero overhead when switching
between threads. A promising back-end target for GPUs is to
generate OpenCL code, using the JavaCL library to interface
that code to Java.
- Parallelization for FPGAs: For the future.
- Garbage collection:
Research the theory of how garbage collection can be done
and what constitutes correct garbage collection.
Implement garbage collectors in both sequential and distributed
environments. Evaluate the performance of different garbage collectors
and the tradeoffs between memory usage, execution time, and algorithmic
complexity.
- Memory locations as data structures:
In Starlog it is possible to specify assignment where a tuple
has different values at different times and the most recent
assignment is used as its current value. In principle
it is possible to compile such a specification to code which
destructively overwrites a single memory location to
store the current value. The project will be to precisely specify
when this is safe, to include a transformation for this into the
compiler, and to evaluate its impact on performance.
- Data (size) dependent data structures:
It is clear that in some applications the appropriate data structures
to use depend on properties of input data which cannot be known at
compile time. Such properties include the size of the data and
the distribution of keys. This project will investigate
ways to gracefully and efficiently switch which data structures are
used as the properties become apparent, will implement such a scheme, and
evaluate its impact on performacne including execution time, memory usage,
and overall system robustness.
- Disk storage as a data structures:
A common technique in programming when data to be processed exceeds
the available RAM is to store the data in external files and to
process it incrementally from there. For example, data might be written to a file
externally sorted then re-read and processed in order.
This project will add such external file processing to the Starlog armamentarium
of potential data structures. It will include specifying, implementing and
evaluating the performance of such techniques.
Application Projects
We need more experience with the JStar style of programming
in many application areas. This typically involves designing
a JStar API for a given application domain, implementing that
API (mostly in Java), writing example JStar programs that use the API,
experimenting with different styles of interaction with the
library, and perhaps performing some usability experiments to
see how easy other programmers find it to understand and use the
libraries.
Here are a few examples of the kinds of projects that are possible.
- Genomics: Develop JStar programs for assembling DNA
fragments into full genomes. This is a CPU and
I/O intensive task, so will be a challenging application for
JStar.
- GUI Games: Develop libraries and
programming styles for developing interactive GUI
applications in JStar. This will probably be done on top of
Java Swing, but the GUI layout will be controlled by relations
defined in the JStar program, similar to how a XAML XML
file can define a .NET GUI. A good test of the new libraries
would be to use them to write some simple interactive games in
JStar.
- Database Applications: Develop JStar
libraries and programming styles for connecting
to relational databases and doing queries and
updates. Since JStar already uses relations as its
programming paradigm for in-memory data, it should be able
to support an elegant connection to external databases.
- CSV Query and Transform: Develop JStar
libraries and example programs for reading and writing CSV
files and space-separated data files directly to/from
JStar in-memory relations.
Since JStar has powerful features for querying and transforming
in-memory tables, this will give us a domain-specific language in
which a three or four line program can query one or several CSV
files in very flexible ways, a bit like using SQL to query a
relational database. This will be useful for analyzing web
log files, querying and transforming simple text databases etc.
This project could also include experimenting with alternative
JStar styles of doing byte-oriented and line-oriented file I/O.
This is a smaller project that is suitable for an honours or
masters student or a COMP477 project.
Please let us know
if you are interested in these topics and would like more
information.
Mark Utting,
Department of Computer Science, University of Waikato, Hamilton, New
Zealand
|