Some example datasets are included in the Weka distribution.
- A jarfile containing 37 classification problems, originally obtained from the UCI repository (datasets-UCI.jar, 1,190,961 Bytes).
- A jarfile containing 37 regression problems, obtained from various sources (datasets-numeric.jar, 169,344 Bytes).
- A jarfile containing 6 agricultural datasets obtained from agricultural researchers in New Zealand (agridatasets.jar, 31,200 Bytes).
- A jarfile containing 30 regression datasets collected by Professor Luis Torgo (regression-datasets.jar, 10,090,266 Bytes).
- A gzip'ed tar containing UCI and UCI KDD datasets (uci-20070111.tar.gz, 17,952,832 Bytes)
- A gzip'ed tar containing StatLib datasets (statlib-20050214.tar.gz, 12,785,582 Bytes)
- A gzip'ed tar containing ordinal, real-world datasets donated by Professor Arie Ben David (Holon Institute of Technology) (datasets-arie_ben_david.tar.gz, 11,348 Bytes)
- A zip file containing 19 multi-class (1-of-n) text datasets donated by Dr George Forman when he was at Hewlett-Packard Labs (19MclassTextWc.zip, 14,084,828 Bytes)
- A bzip'ed tar file containing the Reuters21578 dataset split into separate files according to the ModApte split (reuters21578-ModApte.tar.bz2, 81,745,032 Bytes)
- A zip file containing 41 drug design datasets formed using the Adriana.Code software donated by Dr Mehmet Fatih Amasyali (Yildiz Technical Unversity) (Drug-datasets.zip, 11,376,153 Bytes)
- A zip file containing 80 artificial datasets generated from the Friedman function donated by Dr Mehmet Fatih Amasyali (Yildiz Technical Unversity) (Friedman-datasets.zip, 5,802,204 Bytes)
- A zip file containing a new, image-based version of the classic iris data, with 50 images for each of the three species of iris. The images have size 600x600. Please see the ARFF file for further information (iris_reloaded.zip, 92,267,000 Bytes).
- Protein data sets made available by Associate Professor Shuiwang Ji when he was a PhD student at Louisiana State University
- Kent Ridge Biomedical Data Set Repository, which was put together by Professor Jinyan Li and Dr Huiqing Liu while they were at the Institute for Infocomm Research, Singapore
- Repository for Epitope Datasets (RED), maintained by Professor Yasser El-Manzalawy when he was at Iowa State University.