MILK: A Multi-Instance Learning Kit in Java
Note: The algorithms in MILK are now available from within WEKA (3.5 branch). This became possible due to the introduction of relation-valued attributes in WEKA. Thus MILK as a separate entity has become obsolete.
MILK provides an environment for implementing and comparing
multi-instance learning algorithms. It is heavily based on WEKA and requires WEKA
3.4 to run. MILK includes several learning algorithms for multi-instance
problems, a tool for visualizing multi-instance data, and a GUI
(derived from the WEKA Experimenter) that makes it easy to compare
different learning algorithms on multi-instance datasets.
The MILK distribution
Here is
Most of MILK has been written by Xin Xu and a description of many of
the algorithms in MILK can be found in his MSc thesis. MILK is relased under the GNU General
Public License.
Please contact Eibe Frank if you would like to contribute code to MILK
or if you have a bug fix.
Multi-instance Data in MILK-format
- Musk 1 - smaller version of the drug-activity data used by Dietterich et al.
(92 bags, 166 attributes, 476 instances)
- Musk 2 - larger version of the drug-activity data used by Dietterich et al.
(102 bags, 166 attributes, 6,598 instances)
- Mutagenesis ("easy" version) - predicting mutagenicity (originally an ILP problem)
(188 bags, 7 attributes, 10,468 instances)
- Mutagenesis ("hard" version) - predicting mutagenicity (originally an ILP problem)
(42 bags, 7 attributes, 2,132 instances)
Note that more datasets in this format are available here.