首页 > 解决方案 > R如何在交易数据中合并项目集的类别

问题描述

在 RI 中想使用以下数据框创建一个交易数据,以便我可以apriori在 package中运行arules。它具有交易 ID、项目 ID 和类别 ID、项目的父项。

Transaction_ID  Item_ID Category_ID
T01 A001    A01
T01 A002    A01
T02 A001    A01
T02 A003    A02
T02 A002    A01
T03 A005    A03
T05 A004    A03
T05 A002    A01
T05 A005    A03
T04 A001    A01
T04 A003    A02

我想将类别 ID 作为标签(项目)之上的级别作为Groceries数据合并到交易数据中。

str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
  ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ i       : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
  .. .. ..@ p       : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
  .. .. ..@ Dim     : int [1:2] 169 9835
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ factors : list()
  ..@ itemInfo   :'data.frame': 169 obs. of  3 variables:
  .. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ...
  .. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44 44 44 44 44 44 42 42 41 ...
  .. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6 6 6 6 6 ...
  ..@ itemsetInfo:'data.frame': 0 obs. of  0 variables

但是,read.transactions只允许您使用参数 cols 导入交易 ID 和项目 ID。我也试过这个

transaction_by_item<-split(df[,c("Item_ID","Category_ID")],df$Transaction_ID)
basket <- as(transaction_by_item, "transactions")

它给出了一个错误 Error in asMethod(object) : can coerce list with atomic components only

如果我只是尝试仅使用项目 ID 拆分事务,它会起作用。transaction_by_item<-split(df$Item_ID,df$Transaction_ID)

任何人都知道在创建交易数据时如何合并项目 ID(标签)和类别 ID(级别)?谢谢。

标签: rarules

解决方案


也许这会有所帮助,首先让我们介绍一下arules功能itemInfo()

library(arules)
itemInfo(Groceries)
head(itemInfo(Groceries))
             labels  level2           level1
1       frankfurter sausage meat and sausage
2           sausage sausage meat and sausage
3        liver loaf sausage meat and sausage
4               ham sausage meat and sausage
5              meat sausage meat and sausage
6 finished products sausage meat and sausage

现在,正如您所说,Groceries有几个级别,在您的手中:

trans4 <- as(split(dats[,"Item_ID"], dats[,"Transaction_ID"]), "transactions")
str(trans4)
itemInfo(trans4)
  labels
1   A001
2   A002
3   A003
4   A004
5   A005

现在,您必须将其添加到您的数据中,因此您可以这样做:

library(dplyr)
labels_ <- dats %>% select(Item_ID, Category_ID) %>% distinct()
itemInfo(trans4) <- data.frame(labels = labels_$Item_ID, level1 =labels_$Category_ID)

现在:

itemInfo(trans4)
  labels level1
1   A001    A01
2   A002    A01
3   A003    A02
4   A005    A03
5   A004    A03

和:

str(trans4)
Formal class 'transactions' [package "arules"] with 3 slots
  ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ i       : int [1:11] 0 1 0 1 2 4 0 2 1 3 ...
  .. .. ..@ p       : int [1:6] 0 2 5 6 8 11
  .. .. ..@ Dim     : int [1:2] 5 5
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ factors : list()
  ..@ itemInfo   :'data.frame': 5 obs. of  2 variables:
  .. ..$ labels: Factor w/ 5 levels "A001","A002",..: 1 2 3 5 4
  .. ..$ level1: Factor w/ 3 levels "A01","A02","A03": 1 1 2 3 3    # here we go!!!
  ..@ itemsetInfo:'data.frame': 5 obs. of  1 variable:
  .. ..$ transactionID: chr [1:5] "T01" "T02" "T03" "T04" ...

有数据:

dats <- structure(list(Transaction_ID = structure(c(1L, 1L, 2L, 2L, 2L, 
3L, 5L, 5L, 5L, 4L, 4L), .Label = c("T01", "T02", "T03", "T04", 
"T05"), class = "factor"), Item_ID = structure(c(1L, 2L, 1L, 
3L, 2L, 5L, 4L, 2L, 5L, 1L, 3L), .Label = c("A001", "A002", "A003", 
"A004", "A005"), class = "factor"), Category_ID = structure(c(1L, 
1L, 1L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 2L), .Label = c("A01", "A02", 
"A03"), class = "factor")), class = "data.frame", row.names = c(NA, 
-11L))

推荐阅读