Abstract

IncMine is an extension for MOA to compute Frequent Closed Itemsets from data streams. It implements the method proposed by James Cheng, Yiping Ke and Wilfred Ng "Maintaining frequent closed itemsets over a sliding window" Journal of Intelligent Information Systems, 2008, Volume 31, Number 3, Pages 191-215, using a version of the CHARM algorithm, proposed by Zaki et al., to compute frequent closed itemsets over a batch of transactions.

This extension is a robust, efficient, practical, usable and extendable solution to perform Frequent Itemset mining over data streams and it is fully integrated with MOA functionalities.

The implementation of the CHARM algorithm used is from SPMF:A Sequential Pattern Mining Framework by Philippe Fournier-Viger.

More information in: Methods for frequent pattern mining in data streams within the MOA system.

Documentation

IncMine is an extension for MOA to compute frequent closed itemsets from data streams.

INSTALL

After downloading MOA and the MOA-IncMine.zip file, extract it in a separate folder. You will find a manual explaining how to use the extension. The dist/ folder contains the javadoc and the IncMine.jar file. The src folder contains the source code of the extension.

USAGE

MOA GUI can be run by typing the command in console:

java -cp IncMine.jar;moa.jar -javaagent:sizeofag.jar moa.gui.GUI

To run IncMine you have to use the Classification tab, and select task moa.task.LearnModel or moa.task.LearnEvaluateModel.

LearnModel configuration.

You may use the following parameters with IncMine:

  • -s : Minimum support threshold. Default value: 0.1.
  • -r : Relaxation rate. Default value: 0.5.
  • -l : Segment length. Default value: 1000
  • -w : Window size. Default value: 10.
  • -m : Maximum itemset length. Defalut value: -1 (No maximum length).

IncMine configuration.

EXAMPLES

The following command uses IncMine to extract the Frequent Closed Itemsets from the T40I10D100K stream file with support greater that 0.05 and with a relaxation rate of 0.4. It uses a sliding window of 20 segments, with 5000 transactions per segment. It extracts frequent itemset of maximum length 5.

java -cp IncMine.jar;moa.jar -javaagent:sizeofag.jar moa.DoTask ``LearnModel -m 100000 -l (IncMine -w 20 -m 5 -s 0.05 -r 0.4 -l 5000) -s (ZakiFileStream -f T40I10D100K.ascii)''

ZakiFileStream reads batch datasets as streams. It reads files in the format used by the IBM-Datagen software available in Zaki's webpage (you can find it here).

Download

MOA-IncMine
Example Dataset: T40I10D100K