MILK: A Multi-Instance Learning Kit in Java

Note: The algorithms in MILK are now available from within WEKA (3.5 branch). This became possible due to the introduction of relation-valued attributes in WEKA. Thus MILK as a separate entity has become obsolete.

MILK provides an environment for implementing and comparing multi-instance learning algorithms. It is heavily based on WEKA and requires WEKA 3.4 to run. MILK includes several learning algorithms for multi-instance problems, a tool for visualizing multi-instance data, and a GUI (derived from the WEKA Experimenter) that makes it easy to compare different learning algorithms on multi-instance datasets.

The MILK distribution

Here is

the README from the Milk distribution,
the Javadoc, and
the actual Milk distribution in a Jar file.

Most of MILK has been written by Xin Xu and a description of many of the algorithms in MILK can be found in his MSc thesis. MILK is relased under the GNU General Public License.

Please contact Eibe Frank if you would like to contribute code to MILK or if you have a bug fix.

Multi-instance Data in MILK-format

Musk 1 - smaller version of the drug-activity data used by Dietterich et al.
(92 bags, 166 attributes, 476 instances)
Musk 2 - larger version of the drug-activity data used by Dietterich et al.
(102 bags, 166 attributes, 6,598 instances)
Mutagenesis ("easy" version) - predicting mutagenicity (originally an ILP problem)
(188 bags, 7 attributes, 10,468 instances)
Mutagenesis ("hard" version) - predicting mutagenicity (originally an ILP problem)
(42 bags, 7 attributes, 2,132 instances)

Note that more datasets in this format are available here.