Here are a few of things that are useful to know when you are
having trouble installing or running Weka successfully on your
machine:
- When you download Weka, make sure that the resulting file size is
the same as on our webpage. Otherwise things won't work
properly. Apparently some web browsers have trouble downloading Weka.
- Most Java virtual machines only allocate a certain maximum amount
of memory to run Java programs. Usually this is much less than the
amount of RAM in your computer. However, you can extend the memory
available for the virtual machine by setting appropriate options. With
Sun's JDK, for example, you can go
java -mx100000000
-oss100000000 ...
to set both the maximum Java heap and stack
size to 100,000,000 bytes.
- For maximum enjoyment, use a virtual machine that incorporates a
just-in-time compiler. This can speed things up quite
significantly. Note also that there can be large differences in
execution time between different virtual machines.
- Submitted by Robert Howard:
"Weka 3.2 allows a short cut in
data preparation that I haven't seen mentioned. If your database
outputs its results in CSV (MS Access does) with the column titles in
the top row, then one can skip pulling everything into Excel and Word
(or WordPad). All one has to do is attempt to read the dataset
normally--the ARFF method will rollover to CSV on error. After the
data is read, one modifies the base relation (I use the attribute
filter to remove an unwanted column). Then when you press "save" you
will get a nice ARFF-format file. If you have nominal attributes
you'll have to take the file into WordPad to fix them, but that's
about all."
NOTE: you will not be able to save the relation
until you modify it in some way. If you don't want to change anything,
you can simply uncheck one of the attribute check boxes, click 'Apply
Filters', recheck the box, click 'Apply Filters' again, then click
'Save...' Also, this csv to arff conversion can be done at the
command line, as follows:
java weka.core.converters.CSVLoader
filename.csv > filename.arff
- One way to figure out why arff files are failing to load is to
give them to the Instances class. At the command line type the
following:
java weka.core.Instances filename.arff
where
you substitute 'filename' for the actual name of your file. This
should return an error if there is a problem reading the file, or show
some statistics if the file is ok. The error message you get should
give some indication of what is wrong.
- A common problem people have with arff files is that labels can
only have spaces if they are enclosed in single quotes, i.e. a label
such as:
some value should be written either 'some value' or
some_value in the file.
- Having problems getting Weka to run from a DOS/UNIX command
prompt? Most likely your CLASSPATH environment variable is not set
correctly - it needs to point to the Weka.jar file that you downloaded
with Weka (or the parent of the Weka directory if you have extracted
the jar). Under DOS this can be achieved with:
set
CLASSPATH=c:\weka-3-2\weka.jar;%CLASSPATH%
Under UNIX/Linux something like:
export CLASSPATH=/home/weka/weka.jar:$CLASSPATH
An easy way to get avoid setting the variable this is to specify the
CLASSPATH when calling Java. For example, if the jar file is located
at c:\weka-3-2\weka.jar you can use:
java -cp c:\weka-3-2\weka.jar weka.classifiers... etc.
- People often want to tag their instances with identifiers, so they can keep track of them and the predictions made on them.
If you run from the command line you can use the -p option to output
predictions plus any other attributes you are interested in. So it is
possible to have a string attribute in your data that acts as an identifier. A
problem is that most classifiers don't like String attributes, but you
can get around this by using the RemoveType filter (this removes
String attributes by default).
Here's an example. Lets say you have a training file named train.arff, a
testing file named test.arff, and they have an identifier String
attribute as their 5th attribute. You can get the predictions from J48
along with the identifier strings by issuing the following command (at a
DOS/Unix command prompt):
java weka.classifiers.FilteredClassifier -F weka.filters.unsupervised.attribute.RemoveType -B weka.classifiers.j48.J48 -t train.arff -T test.arff -p 5
(all on a single line)
If you want, you can redirect the output to a file by adding " > output.txt" to the end of the line.
In the Explorer GUI you could try a similar trick of using the
String attribute identifiers here as well. Choose the
FilteredClassifier, with the RemoveType as the filter, and whatever
classifier you prefer. When you visualize the results you will need
click through each instance to see the identifier listed for each.
- Access to visualization from the ClassifierPanel, ClusterPanel
and AttributeSelection panel is available from a popup menu. Click the
right mouse button over an entry in the Result list to bring up the
menu. You will be presented with options for viewing or saving the
text output and---depending on the scheme---further options for
visualizing errors, clusters, trees etc.
- There is the ability to print how much memory is available in the
Explorer and Experimenter and to run the garbage collector. Just right
click over the Status area in the Explorer/Experimenter.
|