Some example datasets are included in the Weka distribution.
- A jarfile containing 37 classification problems, originally obtained from the UCI repository (datasets-UCI.jar, 1,190,961 Bytes).
- A jarfile containing 37 regression problems, obtained from various sources (datasets-numeric.jar, 169,344 Bytes).
- A jarfile containing 6 agricultural datasets obtained from agricultural researchers in New Zealand (agridatasets.jar, 31,200 Bytes).
- A jarfile containing 30 regression datasets collected by Luis Torgo (regression-datasets.jar, 10,090,266 Bytes).
- A gzip'ed tar containing UCI and UCI KDD datasets (uci-20070111.tar.gz, 17,952,832 Bytes)
- A gzip'ed tar containing StatLib datasets (statlib-20050214.tar.gz, 12,785,582 Bytes)
- A gzip'ed tar containing ordinal, real-world datasets donated by Dr. Arie Ben David (Holon Inst. of Technology/Israel) (datasets-arie_ben_david.tar.gz, 11,348 Bytes)
- A zip file containing 19 multi-class (1-of-n) text datasets donated by George Forman/Hewlett-Packard Labs (19MclassTextWc.zip, 14,084,828 Bytes)
- A bzip'ed tar file containing the Reuters21578 dataset split into separate files according to the ModApte split (reuters21578-ModApte.tar.bz2, 81,745,032 Bytes)
- A zip file containing 41 drug design datasets formed using the Adriana.Code software - www.molecular-networks.com/software/adrianacode - donated by Dr. M. Fatih Amasyali (Yildiz Technical Unversity) (Drug-datasets.zip, 11,376,153 Bytes)
- A zip file containing 80 artificial datasets generated from the Friedman function donated by Dr. M. Fatih Amasyali (Yildiz Technical Unversity) (Friedman-datasets.zip, 5,802,204 Bytes)
- A zip file containing a new, image-based version of the classic iris data, with 50 images for each of the three species of iris. The images have size 600x600. Please see the ARFF file for further information (iris_reloaded.zip, 92,267,000 Bytes).
- Protein data sets, maintained by Shuiwang Ji, CS Department, Louisiana State University/USA
- Kent Ridge Biomedical Data Set Repository, maintained by Jinyan Li and Huiqing Liu, Institute for Infocomm Research, Singapore
- Repository for Epitope Datasets (RED), maintained by Yasser El-Manzalawy, Iowa State University.