7/12/21

Comparison of Apriori and FP-Growth algorithms

  Make a comparison of Apriori and FP-Growth algorithms for frequent item set mining in transactional databases. Apply these algorithms to the following data:

TID LIST OF ITEMS

1 Bread, Milk, Sugar, TeaPowder, Cheese, Tomato

2 Onion, Tomato, Chillies, Sugar, Milk

3 Milk, Cake, Biscuits, Cheese, Onion

4 Chillies, Potato, Milk, Cake, Sugar, Bread

5 Bread, Jam, Mik, Butter, Chilles

6 Butter, Cheese, Paneer, Curd, Milk, Biscuits

7 Onion, Paneer, Chilies, Garlic, Milk

8 Bread, Jam, Cake, Biscuits, Tomato

Ans) The apriori algorithm is firstly proposed by R.Aggarwal and R.Srikant in 1994 for mining frequent itemset. Apriori algorithm follows two phases: Generate Phase: In this phase candidate (k+1) itemset is generated using k-itemset; this phase creates Ck candidate set. Prune Phase: In this phase candidate set is pruned to generate large frequent itemset using “minimum support” as the pruning parameter. This phase creates Lk large itemset. 

ADVANTAGES: 1. Use large itemset. 2. Easy to implement. 3. Easily parallelized. 

DISDVANTAGE: 1. It may need to generate a huge no of candidate sets. So its generation is expensive. 2. Assumes transactional database is memory resident. 3. Support count is expensive because require many database scan. Eg: 1) It consists 8 transactions

1 Bread, Milk, Sugar, TeaPowder, Cheese, Tomato

2 Onion, Tomato, Chillies, Sugar, Milk

3 Milk, Cake, Biscuits, Cheese, Onion

4 Chillies, Potato, Milk, Cake, Sugar, Bread

5 Bread, Jam, Milk, Butter, Chilles

6 Butter, Cheese, Paneer, Curd, Milk, Biscuits

7 Onion, Paneer, Chilies, Garlic, Milk

8 Bread, Jam, Cake, Biscuits, Tomato

Step1: Count no of transactions: 8 

Step 2: In this step we remove all the items that are bought less than 2 times from the table

C1

Item set

Support count

Bread

4

Milk

7

Sugar

3

TeaPowder

1

Cheese

3

Tomato

3

Onion

3

Chillies

4

Cake

3

Biscuits

3

Potato

1

Jam

2

Butter

2

Paneer

2

Curd

1

Garlic

1

Compare and prune

Item set

Support count

Bread

4

Milk

7

Sugar

3

Cheese

3

Tomato

3

Onion

3

Chillies

4

Cake

3

Biscuits

3

Jam

2

Butter

2

Paneer

2

C2

Item set

Support count

Bread, Milk

3

Bread, Sugar

2

Bread, Cheese

1

Bread, Tomato

2

Bread, Onion

0

Bread, Chillies

2

Bread, Cake

2

Bread, Biscuits

1

Bread, Jam

2

Bread, Butter

1

Milk, Sugar

3

Milk, Cheese

3

Milk, Tomato

2

Milk, Onion

3

Milk, Chillies

4

Milk, Cake

2

Milk, Biscuits

2

Milk, Jam

1

Milk, Butter

2

Milk, Paneer

2

Sugar, Cheese

1

Sugar, Tomato

2

Sugar, Onion

1

Sugar, Chillies

2

Sugar, Cake

1

Sugar, Biscuits

0

Sugar, Jam

0

Sugar, Butter

0

Sugar, Paneer

0

Cheese, Tomato

1

Cheese, Onion

1

Cheese, Chillies

0

Cheese, Cake

1

Cheese, Biscuits

2

Cheese, Jam

0

Cheese, Butter

1

Cheese, Paneer

1

Tomato, Onion

1

Tomato, Chillies

1

Tomato, Cake

1

Tomato,Biscuits

1

Tomato, Jam

1

Tomato, Butter

0

Tomato, Paneer

0

Onion, Chillies

2

Onion, Cake

1

Onion, Biscuits

1

Onion, Jam

0

Onion, Butter

0

Onion, Paneer

1

Chillies, Cake

1

Chillies,Biscuits

0

Chillies, Jam

1

Chillies, Butter

0

Chillies, Paneer

1

Cake, Biscuits

2

Cake, Jam

1

Cake, Butter

0

Cake, Paneer

0

Biscuits, Jam

1

Biscuits, Butter

1

Biscuits, Paneer

1

Jam, Butter

1

Jam, Paneer

0

Butter, Paneer

1

 

Compare and prune

 

Item set

Support count

Bread, Milk

3

Bread, Sugar

2

Bread, Tomato

2

Bread, Chillies

2

Bread, Cake

2

Bread, Jam

2

Milk, Sugar

3

Milk, Cheese

3

Milk, Tomato

2

Milk, Onion

3

Milk, Chillies

4

Milk, Cake

2

Milk, Biscuits

2

Milk, Butter

2

Milk, Paneer

2

Sugar, Tomato

2

Sugar, Chillies

2

Cheese, Biscuits

2

Onion, Chillies

2

Cake, Biscuits

2

C3

Item set

Support count

Bread, Milk, Sugar

2

Bread, Milk, Tomato

1

Bread, Milk, Chillies

2

Bread, Milk, Cake

1

Bread, Milk, Jam

1

Milk, Sugar, Cheese

1

Milk, Sugar, Tomato

2

Milk, Sugar, Onion

1

Milk, Sugar, Chillies

2

Milk, Sugar, Cake

1

Milk, Sugar, Biscuits

0

Milk, Sugar, Butter

0

Milk, Sugar, Paneer

0

Sugar, Tomato, Chillies

1

 

Compare and prune

Item set

Support count

Bread, Milk, Sugar

2

Bread, Milk, Chillies

2

Milk, Sugar, Tomato

2

Milk, Sugar, Chillies

2

 

According to above statement Bread, Milk, Sugar, Chilies, Tomato is generated whose minimum support is less than 2.so this is not frequent

Thus the set of three items that are bought together most frequently are

 

Bread, Milk, Sugar

Bread, Milk, Chillies

Milk, Sugar, Tomato

Milk, Sugar, Chillies

The FP-Growth Algorithm, proposed by Han in, is an efficient and scalable method for mining the complete set of frequent patterns by pattern fragment growth, using an extended prefix-tree structure for storing compressed and crucial information about frequent patterns named frequent pattern tree (FP-tree). Major steps in FP-growth is

Step1- It firstly compresses the database showing frequent item set in to FP-tree. FP-tree is built using 2 passes over the dataset. Step2: It divides the FP-tree in to a set of conditional database and mines each database separately, thus extract frequent item sets from FP-tree directly. It consist of one root labeled as null, a set of item prefix sub trees as the children of the root, and a frequent .item header table. Each node in the item prefix sub tree consists of three fields: item-name, count and node link where--- item-name registers which item the node represents; count registers the number of transactions represented by the portion of path reaching this node, node link links to the next node in the FP- tree. Each item in the header table consists of two fields---item name and head of node link, which points to the first node in the FP-tree carrying the item name 

ADVANTAGES: 1. It compresses the database. 2. Require only 2 pass over database. 3. There is no candidate generation. 4. Faster than apriori. 5. Reduces search cost 

DISADVANTAGE: 1. It may not fit in main memory. 2. FP tree is expensive to build. i. takes time to build but once built frequent itemset can be obtained easily. ii. Support can only be calculated once the entire dataset is added to fp-tree. 

Parameters

Apriori

Algorithm

FP-growth

Algorithm

Technique

Use Apriori property and join and prune property

It constructs conditional frequent pattern tree and conditional pattern base from database which satisfy

minimum support

Memory utilization

Due to large no. of candidate generation require large memory space.

Due to compact structure and no candidate generation require less memory.

Number of scans

Multiple scans for generating candidate set.

Scan the database only twice and twice only.

Time

Execution time is more as time is wasted in producing candidate every time.

Execution time is lesser than the Apriori algorithm

Eg: 1) It consists 8 transactions

1 Bread, Milk, Sugar, TeaPowder, Cheese, Tomato

2 Onion, Tomato, Chillies, Sugar, Milk

3 Milk, Cake, Biscuits, Cheese, Onion

4 Chillies, Potato, Milk, Cake, Sugar, Bread

5 Bread, Jam, Milk, Butter, Chilles

6 Butter, Cheese, Paneer, Curd, Milk, Biscuits

7 Onion, Paneer, Chilies, Garlic, Milk

8 Bread, Jam, Cake, Biscuits, Tomato 

First we scan the database and determine the set of frequent items (1-itemsets) and their support Counts (frequencies): L={{Milk:7},{Bread:4},{Chillies:4},{sugar:3},{Cheese:3},{Tomato:3},{Onion:3},{Cake:3},{Biscuits:3},{jam:2},{Butter:2},{Paneer:2}}

Each tranction items should arranged in Ordered Item set. like Bread, Milk, Sugar, TeaPowder, Cheese, Tomato converted into Milk,Bread,Sugar,Cheese, Tomato.

Here, all the items are simply linked one after the other in the order of occurrence in the set and initialize the support count for each item as 1.

ow, for each item, the Conditional Pattern Base is computed which is path labels of all the paths which lead to any node of the given item in the frequent-pattern tree. Note that the items in the below table are arranged in the ascending order of their frequencies.

Item

Support count

Node- Link

Milk

7

 

Bread

4

 

Chilies

4

 

Sugar

3

 

Cheese

3

 

Tomato

3

 

Onion

3

 

Cake

3

 

Biscuits

3

 

Jam

2

 

Butter

2

 

Paneer

2

 



No comments:

Blog Archive