IncMine is an extension for MOA to compute Frequent Closed Itemsets from data streams. It implements the method proposed by James Cheng, Yiping Ke and Wilfred Ng "Maintaining frequent closed itemsets over a sliding window" Journal of Intelligent Information Systems, 2008, Volume 31, Number 3, Pages 191-215, using a version of the CHARM algorithm, proposed by Zaki et al., to compute frequent closed itemsets over a batch of transactions.
This extension is a robust, efficient, practical, usable and extendable solution to perform Frequent Itemset mining over data streams and it is fully integrated with MOA functionalities.
The implementation of the CHARM algorithm used is from SPMF:A Sequential Pattern Mining Framework by Philippe Fournier-Viger.
More information in: Methods for frequent pattern mining in data streams within the MOA system.
IncMine is an extension for MOA to compute frequent closed itemsets from data streams.
After downloading MOA and the
MOA-IncMine.zip file, extract it in a separate folder. You will find a manual explaining how to use the extension. The
dist/ folder contains the javadoc and the
IncMine.jar file. The
src folder contains the source code of the extension.
MOA GUI can be run by typing the command in console:
java -cp IncMine.jar;moa.jar -javaagent:sizeofag.jar moa.gui.GUI
To run IncMine you have to use the Classification tab, and select task
You may use the following parameters with IncMine:
- -s : Minimum support threshold. Default value: 0.1.
- -r : Relaxation rate. Default value: 0.5.
- -l : Segment length. Default value: 1000
- -w : Window size. Default value: 10.
- -m : Maximum itemset length. Defalut value: -1 (No maximum length).
The following command uses IncMine to extract the Frequent Closed Itemsets from the
T40I10D100K stream file with support greater that 0.05 and with a relaxation rate of 0.4. It uses a sliding window of 20 segments, with 5000 transactions per segment. It extracts frequent itemset of maximum length 5.
java -cp IncMine.jar;moa.jar -javaagent:sizeofag.jar moa.DoTask ``LearnModel -m 100000 -l (IncMine -w 20 -m 5 -s 0.05 -r 0.4 -l 5000) -s (ZakiFileStream -f T40I10D100K.ascii)''
ZakiFileStream reads batch datasets as streams. It reads files in the format used by the IBM-Datagen software available in Zaki's webpage (you can find it here).
Example Dataset: T40I10D100K