Make a comparison of Apriori and FP-Growth algorithms for frequent item set mining in transactional databases. Apply these algorithms to the following data:
TID LIST OF ITEMS
1 Bread, Milk, Sugar, TeaPowder, Cheese, Tomato
2 Onion, Tomato, Chillies, Sugar, Milk
3 Milk, Cake, Biscuits, Cheese, Onion
4 Chillies, Potato, Milk, Cake, Sugar, Bread
5 Bread, Jam, Mik, Butter, Chilles
6 Butter, Cheese, Paneer, Curd, Milk, Biscuits
7 Onion, Paneer, Chilies, Garlic, Milk
8 Bread, Jam, Cake, Biscuits, Tomato
Ans) The apriori algorithm is firstly proposed by R.Aggarwal and R.Srikant in 1994 for mining frequent itemset. Apriori algorithm follows two phases: Generate Phase: In this phase candidate (k+1) itemset is generated using k-itemset; this phase creates Ck candidate set. Prune Phase: In this phase candidate set is pruned to generate large frequent itemset using “minimum support” as the pruning parameter. This phase creates Lk large itemset.
ADVANTAGES: 1. Use large itemset. 2. Easy to implement. 3. Easily parallelized.
DISDVANTAGE: 1. It may need to generate a huge no of candidate sets. So its generation is expensive. 2. Assumes transactional database is memory resident. 3. Support count is expensive because require many database scan. Eg: 1) It consists 8 transactions
1 Bread, Milk, Sugar, TeaPowder, Cheese, Tomato
2 Onion, Tomato, Chillies, Sugar, Milk
3 Milk, Cake, Biscuits, Cheese, Onion
4 Chillies, Potato, Milk, Cake, Sugar, Bread
5 Bread, Jam, Milk, Butter, Chilles
6 Butter, Cheese, Paneer, Curd, Milk, Biscuits
7 Onion, Paneer, Chilies, Garlic, Milk
8 Bread, Jam, Cake, Biscuits, Tomato
Step1: Count no of transactions: 8
Step 2: In this step we remove all the items that are bought less than 2 times from the table
C1
Item set |
Support count |
Bread |
4 |
Milk |
7 |
Sugar |
3 |
TeaPowder |
1 |
Cheese |
3 |
Tomato |
3 |
Onion |
3 |
Chillies |
4 |
Cake |
3 |
Biscuits |
3 |
Potato |
1 |
Jam |
2 |
Butter |
2 |
Paneer |
2 |
Curd |
1 |
Garlic |
1 |
Compare and prune
Item set |
Support count |
Bread |
4 |
Milk |
7 |
Sugar |
3 |
Cheese |
3 |
Tomato |
3 |
Onion |
3 |
Chillies |
4 |
Cake |
3 |
Biscuits |
3 |
Jam |
2 |
Butter |
2 |
Paneer |
2 |
C2
Item set |
Support count |
Bread, Milk |
3 |
Bread, Sugar |
2 |
Bread, Cheese |
1 |
Bread, Tomato |
2 |
Bread, Onion |
0 |
Bread, Chillies |
2 |
Bread, Cake |
2 |
Bread, Biscuits |
1 |
Bread, Jam |
2 |
Bread, Butter |
1 |
Milk, Sugar |
3 |
Milk, Cheese |
3 |
Milk, Tomato |
2 |
Milk, Onion |
3 |
Milk, Chillies |
4 |
Milk, Cake |
2 |
Milk, Biscuits |
2 |
Milk, Jam |
1 |
Milk, Butter |
2 |
Milk, Paneer |
2 |
Sugar, Cheese |
1 |
Sugar, Tomato |
2 |
Sugar, Onion |
1 |
Sugar, Chillies |
2 |
Sugar, Cake |
1 |
Sugar, Biscuits |
0 |
Sugar, Jam |
0 |
Sugar, Butter |
0 |
Sugar, Paneer |
0 |
Cheese, Tomato |
1 |
Cheese, Onion |
1 |
Cheese, Chillies |
0 |
Cheese, Cake |
1 |
Cheese, Biscuits |
2 |
Cheese, Jam |
0 |
Cheese, Butter |
1 |
Cheese, Paneer |
1 |
Tomato, Onion |
1 |
Tomato, Chillies |
1 |
Tomato, Cake |
1 |
Tomato,Biscuits |
1 |
Tomato, Jam |
1 |
Tomato, Butter |
0 |
Tomato, Paneer |
0 |
Onion, Chillies |
2 |
Onion, Cake |
1 |
Onion, Biscuits |
1 |
Onion, Jam |
0 |
Onion, Butter |
0 |
Onion, Paneer |
1 |
Chillies, Cake |
1 |
Chillies,Biscuits |
0 |
Chillies, Jam |
1 |
Chillies, Butter |
0 |
Chillies, Paneer |
1 |
Cake, Biscuits |
2 |
Cake, Jam |
1 |
Cake, Butter |
0 |
Cake, Paneer |
0 |
Biscuits, Jam |
1 |
Biscuits, Butter |
1 |
Biscuits, Paneer |
1 |
Jam, Butter |
1 |
Jam, Paneer |
0 |
Butter, Paneer |
1 |
Compare and prune
Item set |
Support count |
Bread, Milk |
3 |
Bread, Sugar |
2 |
Bread, Tomato |
2 |
Bread, Chillies |
2 |
Bread, Cake |
2 |
Bread, Jam |
2 |
Milk, Sugar |
3 |
Milk, Cheese |
3 |
Milk, Tomato |
2 |
Milk, Onion |
3 |
Milk, Chillies |
4 |
Milk, Cake |
2 |
Milk, Biscuits |
2 |
Milk, Butter |
2 |
Milk, Paneer |
2 |
Sugar, Tomato |
2 |
Sugar, Chillies |
2 |
Cheese, Biscuits |
2 |
Onion, Chillies |
2 |
Cake, Biscuits |
2 |
C3
Item set |
Support count |
Bread, Milk,
Sugar |
2 |
Bread, Milk, Tomato |
1 |
Bread, Milk,
Chillies |
2 |
Bread, Milk, Cake |
1 |
Bread, Milk,
Jam |
1 |
Milk, Sugar,
Cheese |
1 |
Milk, Sugar,
Tomato |
2 |
Milk, Sugar, Onion |
1 |
Milk, Sugar,
Chillies |
2 |
Milk, Sugar, Cake |
1 |
Milk, Sugar, Biscuits |
0 |
Milk, Sugar, Butter |
0 |
Milk, Sugar, Paneer |
0 |
Sugar, Tomato, Chillies |
1 |
Compare and prune
Item set |
Support count |
Bread, Milk,
Sugar |
2 |
Bread, Milk,
Chillies |
2 |
Milk, Sugar,
Tomato |
2 |
Milk, Sugar,
Chillies |
2 |
According
to above statement Bread, Milk, Sugar,
Chilies, Tomato is generated whose minimum
support is less than 2.so this is not frequent
Thus the set of three items that are bought together most frequently are
Bread, Milk,
Sugar |
Bread, Milk,
Chillies |
Milk, Sugar,
Tomato |
Milk, Sugar,
Chillies |
The FP-Growth Algorithm, proposed by Han in, is an efficient and scalable method for mining the complete set of frequent patterns by pattern fragment growth, using an extended prefix-tree structure for storing compressed and crucial information about frequent patterns named frequent pattern tree (FP-tree). Major steps in FP-growth is
Step1- It firstly compresses the database showing frequent item set in to FP-tree. FP-tree is built using 2 passes over the dataset. Step2: It divides the FP-tree in to a set of conditional database and mines each database separately, thus extract frequent item sets from FP-tree directly. It consist of one root labeled as null, a set of item prefix sub trees as the children of the root, and a frequent .item header table. Each node in the item prefix sub tree consists of three fields: item-name, count and node link where--- item-name registers which item the node represents; count registers the number of transactions represented by the portion of path reaching this node, node link links to the next node in the FP- tree. Each item in the header table consists of two fields---item name and head of node link, which points to the first node in the FP-tree carrying the item name
ADVANTAGES: 1. It compresses the database. 2. Require only 2 pass over database. 3. There is no candidate generation. 4. Faster than apriori. 5. Reduces search cost
DISADVANTAGE: 1. It may not fit in main memory. 2. FP tree is expensive to build. i. takes time to build but once built frequent itemset can be obtained easily. ii. Support can only be calculated once the entire dataset is added to fp-tree.
Parameters |
Apriori Algorithm |
FP-growth Algorithm |
Technique |
Use Apriori
property and join
and prune property |
It constructs conditional frequent pattern tree and conditional pattern base from database which satisfy minimum support |
Memory utilization |
Due to large no. of candidate generation require large
memory space. |
Due to compact structure and no candidate generation require less memory. |
Number of scans |
Multiple scans
for generating candidate set. |
Scan the database only
twice and twice only. |
Time |
Execution time is more as time is wasted in producing candidate every time. |
Execution time
is lesser than
the Apriori algorithm |
Eg: 1) It consists 8 transactions
1 Bread, Milk, Sugar, TeaPowder, Cheese, Tomato
2 Onion, Tomato, Chillies, Sugar, Milk
3 Milk, Cake, Biscuits, Cheese, Onion
4 Chillies, Potato, Milk, Cake, Sugar, Bread
5 Bread, Jam, Milk, Butter, Chilles
6 Butter, Cheese, Paneer, Curd, Milk, Biscuits
7 Onion, Paneer, Chilies, Garlic, Milk
8 Bread, Jam, Cake, Biscuits, Tomato
First we scan the database and determine the set of frequent items (1-itemsets) and their support Counts (frequencies): L={{Milk:7},{Bread:4},{Chillies:4},{sugar:3},{Cheese:3},{Tomato:3},{Onion:3},{Cake:3},{Biscuits:3},{jam:2},{Butter:2},{Paneer:2}}
Each tranction items should arranged in Ordered Item set. like Bread, Milk, Sugar, TeaPowder, Cheese, Tomato converted into Milk,Bread,Sugar,Cheese, Tomato.
Here, all the items are simply linked one after the other in the order of occurrence in the set and initialize the support count for each item as 1.
ow, for each item, the Conditional Pattern Base is computed which is path labels of all the paths which lead to any node of the given item in the frequent-pattern tree. Note that the items in the below table are arranged in the ascending order of their frequencies.
Item |
Support count |
Node- Link |
Milk |
7 |
|
Bread |
4 |
|
Chilies |
4 |
|
Sugar |
3 |
|
Cheese |
3 |
|
Tomato |
3 |
|
Onion |
3 |
|
Cake |
3 |
|
Biscuits |
3 |
|
Jam |
2 |
|
Butter |
2 |
|
Paneer |
2 |
|
No comments:
Post a Comment