Jie Wang

3 minute read

This is a simple guide to show you how to run the function read.transaction to coerce shopping basket data into the required format by the packages arules and aulesViz.

The letter a through s are the name of shopping items available. Assume that we store the sample basket data in a plain text file, namely baskets.

Convert the Sample Data into the Transactions Class

The arules package provides the function read.transactions which reads basket data into a transactions class.

The following script will read basket data as the Transactions class.

transactions <- arules::read.transactions(
  file="baskets",
  format = c("basket"),
  sep = ",",
  cols =NULL,
  rm.duplicates = 1,
  skip = 0
)

The parameter format is a character string indicating the format of the data set.

  • For ‘basket’ format, each line in the transaction data file represents a transaction where the items (item labels) are separated by the characters specified by sep.
  • For ‘single’ format, each line corresponds to a single item, containing at least ids for the transaction and the item.

The parameter sep is a character string specifying how fields are separated in the data file. We use ‘,’ in the sample basket data.

The parameter skip is number of lines to skip in the file before start reading data.

Validate the Data

Inspect the data structure

To validate whether the baskets have been read into the class correctly, take a glimpse of the transactions data.

str(transactions)
## Formal class 'transactions' [package "arules"] with 3 slots
##   ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
##   .. .. ..@ i       : int [1:33] 0 2 3 5 6 10 11 14 0 1 ...
##   .. .. ..@ p       : int [1:6] 0 8 15 20 25 33
##   .. .. ..@ Dim     : int [1:2] 16 5
##   .. .. ..@ Dimnames:List of 2
##   .. .. .. ..$ : NULL
##   .. .. .. ..$ : NULL
##   .. .. ..@ factors : list()
##   ..@ itemInfo   :'data.frame':	16 obs. of  1 variable:
##   .. ..$ labels: chr [1:16] "a" "b" "c" "d" ...
##   ..@ itemsetInfo:'data.frame':	0 obs. of  0 variables

The Item Names

To validate the item names, run the command:

transactions@itemInfo$labels
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "j" "k" "l" "m" "n" "o" "p" "s"

The result shows the item names, a through s.

Find Number of Transactions

#examine number of transactions 
transactions@data@Dim[2]
## [1] 5

List the Items of a Specific Transaction

The component transactions@data@p store the position indexes by which we can read the items for a given transaction.

transactions@data@p
## [1]  0  8 15 20 25 33

To find the first position and last position to read the items for the ith transaction, say, the first transaction, run the following script to find the first and last item index.

transactions@data@p[1:2]
## [1] 0 8

The two index numbers give the position range in the component transactions@data@i.

transactions@data@i[0:8]
## [1]  0  2  3  5  6 10 11 14

To map the item indexes to the item labels:

itemIndex <- transactions@data@i[0:8] + 1
transactions@itemInfo$labels[itemIndex]
## [1] "a" "c" "d" "f" "g" "l" "m" "p"

Compare the items above to the first line in the original basket data:

f,a,c,d,g,l,m,p

They are the same. The data conversion is successful.

comments powered by Disqus