Weka Machine Learning Software Weka Data Mining Book Weka Machine Learning Project



Weka 3 -- Tips & Tricks


Here are a few of things that are useful to know when you are having trouble installing or running Weka successfully on your machine:
  • When you download Weka, make sure that the resulting file size is the same as on our webpage. Otherwise things won't work properly. Apparently some web browsers have trouble downloading Weka.
  • Most Java virtual machines only allocate a certain maximum amount of memory to run Java programs. Usually this is much less than the amount of RAM in your computer. However, you can extend the memory available for the virtual machine by setting appropriate options. With Sun's JDK, for example, you can go

    java -mx100000000 -oss100000000 ...

    to set both the maximum Java heap and stack size to 100,000,000 bytes.
  • For maximum enjoyment, use a virtual machine that incorporates a just-in-time compiler. This can speed things up quite significantly. Note also that there can be large differences in execution time between different virtual machines.
  • Submitted by Robert Howard:

    "Weka 3.2 allows a short cut in data preparation that I haven't seen mentioned. If your database outputs its results in CSV (MS Access does) with the column titles in the top row, then one can skip pulling everything into Excel and Word (or WordPad). All one has to do is attempt to read the dataset normally--the ARFF method will rollover to CSV on error. After the data is read, one modifies the base relation (I use the attribute filter to remove an unwanted column). Then when you press "save" you will get a nice ARFF-format file. If you have nominal attributes you'll have to take the file into WordPad to fix them, but that's about all."

    NOTE: you will not be able to save the relation until you modify it in some way. If you don't want to change anything, you can simply uncheck one of the attribute check boxes, click 'Apply Filters', recheck the box, click 'Apply Filters' again, then click 'Save...'
    Also, this csv to arff conversion can be done at the command line, as follows:

    java weka.core.converters.CSVLoader filename.csv > filename.arff

  • One way to figure out why arff files are failing to load is to give them to the Instances class. At the command line type the following:

    java weka.core.Instances filename.arff

    where you substitute 'filename' for the actual name of your file. This should return an error if there is a problem reading the file, or show some statistics if the file is ok. The error message you get should give some indication of what is wrong.
  • A common problem people have with arff files is that labels can only have spaces if they are enclosed in single quotes, i.e. a label such as:
    some value
    should be written either 'some value' or some_value in the file.
  • Having problems getting Weka to run from a DOS/UNIX command prompt? Most likely your CLASSPATH environment variable is not set correctly - it needs to point to the Weka.jar file that you downloaded with Weka (or the parent of the Weka directory if you have extracted the jar). Under DOS this can be achieved with:

    set CLASSPATH=c:\weka-3-2\weka.jar;%CLASSPATH%

    Under UNIX/Linux something like:

    export CLASSPATH=/home/weka/weka.jar:$CLASSPATH

    An easy way to get avoid setting the variable this is to specify the CLASSPATH when calling Java. For example, if the jar file is located at c:\weka-3-2\weka.jar you can use:

    java -cp c:\weka-3-2\weka.jar weka.classifiers... etc.

  • People often want to tag their instances with identifiers, so they can keep track of them and the predictions made on them.
    If you run from the command line you can use the -p option to output predictions plus any other attributes you are interested in. So it is possible to have a string attribute in your data that acts as an identifier. A problem is that most classifiers don't like String attributes, but you can get around this by using the RemoveType filter (this removes String attributes by default).
    Here's an example. Lets say you have a training file named train.arff, a testing file named test.arff, and they have an identifier String attribute as their 5th attribute. You can get the predictions from J48 along with the identifier strings by issuing the following command (at a DOS/Unix command prompt):

    java weka.classifiers.FilteredClassifier -F weka.filters.unsupervised.attribute.RemoveType -B weka.classifiers.j48.J48 -t train.arff -T test.arff -p 5

    (all on a single line)
    If you want, you can redirect the output to a file by adding " > output.txt" to the end of the line.
    In the Explorer GUI you could try a similar trick of using the String attribute identifiers here as well. Choose the FilteredClassifier, with the RemoveType as the filter, and whatever classifier you prefer. When you visualize the results you will need click through each instance to see the identifier listed for each.
  • Access to visualization from the ClassifierPanel, ClusterPanel and AttributeSelection panel is available from a popup menu. Click the right mouse button over an entry in the Result list to bring up the menu. You will be presented with options for viewing or saving the text output and---depending on the scheme---further options for visualizing errors, clusters, trees etc.
  • There is the ability to print how much memory is available in the Explorer and Experimenter and to run the garbage collector. Just right click over the Status area in the Explorer/Experimenter.