Some example datasets are included in the Weka distribution.
Available separately:
- A jarfile containing 37 classification problems, originally obtained from the UCI repository (datasets-UCI.jar, 1,190,961 Bytes).
- A jarfile containing 37 regression problems, obtained from various sources (datasets-numeric.jar, 169,344 Bytes).
- A jarfile containing 6 agricultural datasets obtained from agricultural researchers in New Zealand (agridatasets.jar, 31,200 Bytes).
- A jarfile containing 30 regression datasets collected by Luis Torgo (regression-datasets.jar, 10,090,266 Bytes).
- A gzip'ed tar containing UCI and UCI KDD datasets (uci-20070111.tar.gz, 17,952,832 Bytes)
- A gzip'ed tar containing StatLib datasets (statlib-20050214.tar.gz, 12,785,582 Bytes)
- A gzip'ed tar containing ordinal, real-world datasets donated by Dr. Arie Ben David (Holon Inst. of Technology/Israel) (datasets-arie_ben_david.tar.gz, 11,348 Bytes)
- A zip file containing 19 multi-class (1-of-n) text datasets donated by George Forman/Hewlett-Packard Labs (19MclassTextWc.zip, 14,084,828 Bytes)
- A bzip'ed tar file containing the Reuters21578 dataset split into separate files according to the ModApte split (reuters21578-ModApte.tar.bz2, 81,745,032 Bytes)
- A zip file containing 41 drug design datasets formed using the Adriana.Code software - www.molecular-networks.com/software/adrianacode - donated by Dr. M. Fatih Amasyali (Yildiz Technical Unversity) (Drug-datasets.zip, 11,376,153 Bytes)
- A zip file containing 80 artificial datasets generated from the Friedman function donated by Dr. M. Fatih Amasyali (Yildiz Technical Unversity) (Friedman-datasets.zip, 5,802,204 Bytes)
- Protein data sets, maintained by Shuiwang Ji, CS Department, Louisiana State University/USA
- Kent Ridge Biomedical Data Set Repository, maintained by Jinyan Li and Huiqing Liu, Institute for Infocomm Research, Singapore
- Repository for Epitope Datasets (RED), maintained by Yasser El-Manzalawy, Iowa State University.