Collections of datasets
Available separately:
- A jarfile containing 37 classification problems,
originally obtained from the UCI repository
(datasets-UCI.jar, 1,190,961 Bytes).
- A jarfile containing 37 regression problems, obtained from various
sources
(datasets-numeric.jar, 169,344 Bytes).
- A jarfile containing 6 agricultural datasets obtained from
agricultural researchers in New Zealand
(agridatasets.jar, 31,200 Bytes).
- A jarfile containing 30 regression datasets collected by
Luis Torgo
(regression-datasets.jar, 10,090,266 Bytes).
- A gzip'ed tar containing UCI and UCI KDD datasets (uci-20070111.tar.gz, 17,952,832 Bytes)
- A gzip'ed tar containing StatLib datasets (statlib-20050214.tar.gz, 12,785,582 Bytes)
- A gzip'ed tar containing ordinal, real-world datasets donated by Dr. Arie Ben David (Holon Inst. of Technology/Israel) (datasets-arie_ben_david.tar.gz, 11,348 Bytes)
- A zip file containing 19 multi-class (1-of-n) text datasets donated by George Forman/Hewlett-Packard Labs (19MclassTextWc.zip, 14,084,828 Bytes)
- A bzip'ed tar file containing the Reuters21578 dataset split into separate files according to the ModApte split (reuters21578-ModApte.tar.bz2, 81,745,032 Bytes)
- A zip file containing 41 drug design datasets formed using the Adriana.Code software - www.molecular-networks.com/software/adrianacode - donated by Dr. M. Fatih Amasyali (Yildiz Technical Unversity) (Drug-datasets.zip, 11,376,153 Bytes)
- A zip file containing 80 artificial datasets generated from the Friedman function donated by Dr. M. Fatih Amasyali (Yildiz Technical Unversity) (Friedman-datasets.zip, 5,802,204 Bytes)
After expanding into a directory using your jar utility (or an
archive program that handles tar-archives/zip files in case of the
gzip'ed tars/zip files), these datasets may be used with Weka.
Other datasets in ARFF format:
|