This is a simple guide to show you how to run the function read.transaction to coerce shopping basket data into the required format by the packages arules and aulesViz.


The Sample Basket Data

The original shopping basket data is often organized by writing each transaction in one row and in each line, listing every item separated by a delimiter. The following is an example of the basket data, which stores five baskets whose items are listed in each line.

f,a,c,d,g,l,m,p

a,b,c,f,l,m,o

b,f,h,j,o

b,c,k,s,p

a,f,c,e,l,p,m,n

The letter a through s are the name of shopping items available. Assume that we store the sample basket data in a plain text file, namely baskets.


Convert the Sample Data into the Transactions Class

The arules package provides the function read.transactions which reads basket data into a transactions class.

The following script will read basket data as the Transactions class.

transactions <- arules::read.transactions(
  file="baskets",
  format = c("basket"),
  sep = ",",
  cols =NULL,
  rm.duplicates = 1,
  skip = 0
)

The parameter format is a character string indicating the format of the data set.

  • For ‘basket’ format, each line in the transaction data file represents a transaction where the items (item labels) are separated by the characters specified by sep.

  • For ‘single’ format, each line corresponds to a single item, containing at least ids for the transaction and the item.

The parameter sep is a character string specifying how fields are separated in the data file. We use ‘,’ in the sample basket data.

The parameter skip is number of lines to skip in the file before start reading data.


Validate the Data

Inspect the data structure

To validate whether the baskets have been read into the class correctly, take a glimpse of the transactions data.

str(transactions)
## Formal class 'transactions' [package "arules"] with 3 slots
##   ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
##   .. .. ..@ i       : int [1:33] 0 2 3 5 6 10 11 14 0 1 ...
##   .. .. ..@ p       : int [1:6] 0 8 15 20 25 33
##   .. .. ..@ Dim     : int [1:2] 16 5
##   .. .. ..@ Dimnames:List of 2
##   .. .. .. ..$ : NULL
##   .. .. .. ..$ : NULL
##   .. .. ..@ factors : list()
##   ..@ itemInfo   :'data.frame':	16 obs. of  1 variable:
##   .. ..$ labels: chr [1:16] "a" "b" "c" "d" ...
##   ..@ itemsetInfo:'data.frame':	0 obs. of  0 variables

The Item Names

To validate the item names, run the command:

transactions@itemInfo$labels
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "j" "k" "l" "m" "n" "o" "p" "s"

The result shows the item names, a through s.

Find Number of Transactions

#examine number of transactions 
transactions@data@Dim[2]
## [1] 5

List the Items of a Specific Transaction

The component transactions@data@p store the position indexes by which we can read the items for a given transaction.

transactions@data@p
## [1]  0  8 15 20 25 33

To find the first position and last position to read the items for the ith transaction, say, the first transaction, run the following script to find the first and last item index.

transactions@data@p[1:2]
## [1] 0 8

The two index numbers give the position range in the component transactions@data@i.

transactions@data@i[0:8]
## [1]  0  2  3  5  6 10 11 14

To map the item indexes to the item labels:

itemIndex <- transactions@data@i[0:8] + 1
transactions@itemInfo$labels[itemIndex]
## [1] "a" "c" "d" "f" "g" "l" "m" "p"

Compare the items above to the first line in the original basket data:

f,a,c,d,g,l,m,p

They are the same. The data conversion is successful.