jDataLab Jie Wang

3 minute read

This is Part 3 to show you how to perform association rules mining by using the R packages arules and aulesViz. In order to test the script, you must complete Part 1 and Part 2.

The Basket Data

In Part 2 Read Transaction Data ,

we have read the following five shopping basket data into R, of the Transactions class.

 f,a,c,d,g,l,m,p
 a,b,c,f,l,m,o
 b,f,h,j,o
 b,c,k,s,p
 a,f,c,e,l,p,m,n

To find the frequent 1-itemsets, we can set a minimum support to 0.5, minlen to 1 and maxlen to 1.The parameter target is frequent itemsets.

The following script will return to itemsets, all the 1-itemsets whose support is at least \(50%\).

#all the 1-itemsets having at least a support of 0.5
itemsets <- apriori(
  transactions, 
  parameter = list(minlen=1, maxlen=1, support=0.5, target="frequent itemsets")
)

A Summary of the Frequent k-Itemsets

To display a summary of the frequent 1-itemsets, run summary with itemsets.

summary(itemsets)
## set of 7 itemsets
## 
## most frequent items:
##       a       b       c       f       l (Other) 
##       1       1       1       1       1       2 
## 
## element (itemset/transaction) length distribution:sizes
## 1 
## 7 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1       1       1       1       1       1 
## 
## summary of quality measures:
##     support           count      
##  Min.   :0.6000   Min.   :3.000  
##  1st Qu.:0.6000   1st Qu.:3.000  
##  Median :0.6000   Median :3.000  
##  Mean   :0.6571   Mean   :3.286  
##  3rd Qu.:0.7000   3rd Qu.:3.500  
##  Max.   :0.8000   Max.   :4.000  
## 
## includes transaction ID lists: FALSE 
## 
## mining info:
##          data ntransactions support confidence
##  transactions             5     0.5          1

The summary shows that the support of 1-itemsets ranges from 0.6 to 0.8. The maximum support of 1-itemset is 0.8

The Top-N Frequent k-Itemsets

To print all 1-itemsets in descending order of support,

#print all 1-itemsets in descending order of support
inspect(sort(itemsets, by="support"))
##     items support count
## [1] {f}   0.8     4    
## [2] {c}   0.8     4    
## [3] {b}   0.6     3    
## [4] {p}   0.6     3    
## [5] {a}   0.6     3    
## [6] {m}   0.6     3    
## [7] {l}   0.6     3

Only print the top-5 1-itemsets in descending order of support,

#print top-5 1-itemsets in descending order of support
inspect(head(sort(itemsets, by="support"), 5))
##     items support count
## [1] {f}   0.8     4    
## [2] {c}   0.8     4    
## [3] {b}   0.6     3    
## [4] {p}   0.6     3    
## [5] {a}   0.6     3

Exercise

Write a script which returns all the 2-itemsets whose support is at least $$50%$$, finds the minmum support and maximum support, number of frequent 2-itemsets, and print all the itemsets.