Madhava Rao Maganti: Comparison of Apriori and FP-Growth algorithms

Make a comparison of Apriori and FP-Growth algorithms for frequent item set mining in transactional databases. Apply these algorithms to the following data:

TID LIST OF ITEMS

1 Bread, Milk, Sugar, TeaPowder, Cheese, Tomato

2 Onion, Tomato, Chillies, Sugar, Milk

3 Milk, Cake, Biscuits, Cheese, Onion

4 Chillies, Potato, Milk, Cake, Sugar, Bread

5 Bread, Jam, Mik, Butter, Chilles

6 Butter, Cheese, Paneer, Curd, Milk, Biscuits

7 Onion, Paneer, Chilies, Garlic, Milk

8 Bread, Jam, Cake, Biscuits, Tomato

Ans) The apriori algorithm is firstly proposed by R.Aggarwal and R.Srikant in 1994 for mining frequent itemset. Apriori algorithm follows two phases: Generate Phase: In this phase candidate (k+1) itemset is generated using k-itemset; this phase creates Ck candidate set. Prune Phase: In this phase candidate set is pruned to generate large frequent itemset using “minimum support” as the pruning parameter. This phase creates Lk large itemset.

ADVANTAGES: 1. Use large itemset. 2. Easy to implement. 3. Easily parallelized.

DISDVANTAGE: 1. It may need to generate a huge no of candidate sets. So its generation is expensive. 2. Assumes transactional database is memory resident. 3. Support count is expensive because require many database scan. Eg: 1) It consists 8 transactions

1 Bread, Milk, Sugar, TeaPowder, Cheese, Tomato

2 Onion, Tomato, Chillies, Sugar, Milk

3 Milk, Cake, Biscuits, Cheese, Onion

4 Chillies, Potato, Milk, Cake, Sugar, Bread

5 Bread, Jam, Milk, Butter, Chilles

6 Butter, Cheese, Paneer, Curd, Milk, Biscuits

7 Onion, Paneer, Chilies, Garlic, Milk

8 Bread, Jam, Cake, Biscuits, Tomato

Step1: Count no of transactions: 8

Step 2: In this step we remove all the items that are bought less than 2 times from the table

Item set	Support count
Bread	4
Milk	7
Sugar	3
TeaPowder	1
Cheese	3
Tomato	3
Onion	3
Chillies	4
Cake	3
Biscuits	3
Potato	1
Jam	2
Butter	2
Paneer	2
Curd	1
Garlic	1

Compare and prune

Item set	Support count
Bread	4
Milk	7
Sugar	3
Cheese	3
Tomato	3
Onion	3
Chillies	4
Cake	3
Biscuits	3
Jam	2
Butter	2
Paneer	2

Item set	Support count
Bread, Milk	3
Bread, Sugar	2
Bread, Cheese	1
Bread, Tomato	2
Bread, Onion	0
Bread, Chillies	2
Bread, Cake	2
Bread, Biscuits	1
Bread, Jam	2
Bread, Butter	1
Milk, Sugar	3
Milk, Cheese	3
Milk, Tomato	2
Milk, Onion	3
Milk, Chillies	4
Milk, Cake	2
Milk, Biscuits	2
Milk, Jam	1
Milk, Butter	2
Milk, Paneer	2

Sugar, Cheese	1
Sugar, Tomato	2
Sugar, Onion	1
Sugar, Chillies	2
Sugar, Cake	1
Sugar, Biscuits	0
Sugar, Jam	0
Sugar, Butter	0
Sugar, Paneer	0
Cheese, Tomato	1
Cheese, Onion	1
Cheese, Chillies	0
Cheese, Cake	1
Cheese, Biscuits	2
Cheese, Jam	0
Cheese, Butter	1
Cheese, Paneer	1
Tomato, Onion	1
Tomato, Chillies	1
Tomato, Cake	1
Tomato,Biscuits	1
Tomato, Jam	1
Tomato, Butter	0
Tomato, Paneer	0
Onion, Chillies	2
Onion, Cake	1
Onion, Biscuits	1
Onion, Jam	0
Onion, Butter	0
Onion, Paneer	1
Chillies, Cake	1
Chillies,Biscuits	0
Chillies, Jam	1
Chillies, Butter	0
Chillies, Paneer	1
Cake, Biscuits	2
Cake, Jam	1
Cake, Butter	0
Cake, Paneer	0
Biscuits, Jam	1
Biscuits, Butter	1
Biscuits, Paneer	1
Jam, Butter	1
Jam, Paneer	0
Butter, Paneer	1

Compare and prune

Item set	Support count
Bread, Milk	3
Bread, Sugar	2
Bread, Tomato	2
Bread, Chillies	2
Bread, Cake	2
Bread, Jam	2
Milk, Sugar	3
Milk, Cheese	3
Milk, Tomato	2

Milk, Onion	3
Milk, Chillies	4
Milk, Cake	2
Milk, Biscuits	2
Milk, Butter	2
Milk, Paneer	2
Sugar, Tomato	2
Sugar, Chillies	2
Cheese, Biscuits	2
Onion, Chillies	2
Cake, Biscuits	2

Item set	Support count
Bread, Milk, Sugar	2
Bread, Milk, Tomato	1
Bread, Milk, Chillies	2
Bread, Milk, Cake	1
Bread, Milk, Jam	1
Milk, Sugar, Cheese	1
Milk, Sugar, Tomato	2
Milk, Sugar, Onion	1
Milk, Sugar, Chillies	2
Milk, Sugar, Cake	1
Milk, Sugar, Biscuits	0
Milk, Sugar, Butter	0
Milk, Sugar, Paneer	0
Sugar, Tomato, Chillies	1

Compare and prune

Item set	Support count
Bread, Milk, Sugar	2
Bread, Milk, Chillies	2
Milk, Sugar, Tomato	2
Milk, Sugar, Chillies	2

According to above statement Bread, Milk, Sugar, Chilies, Tomato is generated whose minimum support is less than 2.so this is not frequent

Thus the set of three items that are bought together most frequently are

Bread, Milk, Sugar

Bread, Milk, Chillies

Milk, Sugar, Tomato

Milk, Sugar, Chillies

The FP-Growth Algorithm, proposed by Han in, is an efficient and scalable method for mining the complete set of frequent patterns by pattern fragment growth, using an extended prefix-tree structure for storing compressed and crucial information about frequent patterns named frequent pattern tree (FP-tree). Major steps in FP-growth is

Step1- It firstly compresses the database showing frequent item set in to FP-tree. FP-tree is built using 2 passes over the dataset. Step2: It divides the FP-tree in to a set of conditional database and mines each database separately, thus extract frequent item sets from FP-tree directly. It consist of one root labeled as null, a set of item prefix sub trees as the children of the root, and a frequent .item header table. Each node in the item prefix sub tree consists of three fields: item-name, count and node link where--- item-name registers which item the node represents; count registers the number of transactions represented by the portion of path reaching this node, node link links to the next node in the FP- tree. Each item in the header table consists of two fields---item name and head of node link, which points to the first node in the FP-tree carrying the item name

ADVANTAGES: 1. It compresses the database. 2. Require only 2 pass over database. 3. There is no candidate generation. 4. Faster than apriori. 5. Reduces search cost

DISADVANTAGE: 1. It may not fit in main memory. 2. FP tree is expensive to build. i. takes time to build but once built frequent itemset can be obtained easily. ii. Support can only be calculated once the entire dataset is added to fp-tree.

Parameters	Apriori Algorithm	FP-growth Algorithm
Technique	Use Apriori property and join and prune property	It constructs conditional frequent pattern tree and conditional pattern base from database which satisfy minimum support
Memory utilization	Due to large no. of candidate generation require large memory space.	Due to compact structure and no candidate generation require less memory.
Number of scans	Multiple scans for generating candidate set.	Scan the database only twice and twice only.
Time	Execution time is more as time is wasted in producing candidate every time.	Execution time is lesser than the Apriori algorithm

Eg: 1) It consists 8 transactions

1 Bread, Milk, Sugar, TeaPowder, Cheese, Tomato

2 Onion, Tomato, Chillies, Sugar, Milk

3 Milk, Cake, Biscuits, Cheese, Onion

4 Chillies, Potato, Milk, Cake, Sugar, Bread

5 Bread, Jam, Milk, Butter, Chilles

6 Butter, Cheese, Paneer, Curd, Milk, Biscuits

7 Onion, Paneer, Chilies, Garlic, Milk

8 Bread, Jam, Cake, Biscuits, Tomato

First we scan the database and determine the set of frequent items (1-itemsets) and their support Counts (frequencies): L={{Milk:7},{Bread:4},{Chillies:4},{sugar:3},{Cheese:3},{Tomato:3},{Onion:3},{Cake:3},{Biscuits:3},{jam:2},{Butter:2},{Paneer:2}}

Each tranction items should arranged in Ordered Item set. like Bread, Milk, Sugar, TeaPowder, Cheese, Tomato converted into Milk,Bread,Sugar,Cheese, Tomato.

Here, all the items are simply linked one after the other in the order of occurrence in the set and initialize the support count for each item as 1.

ow, for each item, the Conditional Pattern Base is computed which is path labels of all the paths which lead to any node of the given item in the frequent-pattern tree. Note that the items in the below table are arranged in the ascending order of their frequencies.

Item	Support count	Node- Link
Milk	7
Bread	4
Chilies	4
Sugar	3
Cheese	3
Tomato	3
Onion	3
Cake	3
Biscuits	3
Jam	2
Butter	2
Paneer	2

Madhava Rao Maganti

7/12/21

Comparison of Apriori and FP-Growth algorithms

No comments:

Blog Archive

Report Abuse