r - R如何在交易数据中合并项目集的类别
问题描述
在 RI 中想使用以下数据框创建一个交易数据,以便我可以apriori
在 package中运行arules
。它具有交易 ID、项目 ID 和类别 ID、项目的父项。
Transaction_ID Item_ID Category_ID
T01 A001 A01
T01 A002 A01
T02 A001 A01
T02 A003 A02
T02 A002 A01
T03 A005 A03
T05 A004 A03
T05 A002 A01
T05 A005 A03
T04 A001 A01
T04 A003 A02
我想将类别 ID 作为标签(项目)之上的级别作为Groceries
数据合并到交易数据中。
str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
.. .. ..@ p : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
.. .. ..@ Dim : int [1:2] 169 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 169 obs. of 3 variables:
.. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ...
.. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44 44 44 44 44 44 42 42 41 ...
.. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6 6 6 6 6 ...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
但是,read.transactions
只允许您使用参数 cols 导入交易 ID 和项目 ID。我也试过这个
transaction_by_item<-split(df[,c("Item_ID","Category_ID")],df$Transaction_ID)
basket <- as(transaction_by_item, "transactions")
它给出了一个错误
Error in asMethod(object) : can coerce list with atomic components only
如果我只是尝试仅使用项目 ID 拆分事务,它会起作用。transaction_by_item<-split(df$Item_ID,df$Transaction_ID)
任何人都知道在创建交易数据时如何合并项目 ID(标签)和类别 ID(级别)?谢谢。
解决方案
也许这会有所帮助,首先让我们介绍一下arules
功能itemInfo()
:
library(arules)
itemInfo(Groceries)
head(itemInfo(Groceries))
labels level2 level1
1 frankfurter sausage meat and sausage
2 sausage sausage meat and sausage
3 liver loaf sausage meat and sausage
4 ham sausage meat and sausage
5 meat sausage meat and sausage
6 finished products sausage meat and sausage
现在,正如您所说,Groceries
有几个级别,在您的手中:
trans4 <- as(split(dats[,"Item_ID"], dats[,"Transaction_ID"]), "transactions")
str(trans4)
itemInfo(trans4)
labels
1 A001
2 A002
3 A003
4 A004
5 A005
现在,您必须将其添加到您的数据中,因此您可以这样做:
library(dplyr)
labels_ <- dats %>% select(Item_ID, Category_ID) %>% distinct()
itemInfo(trans4) <- data.frame(labels = labels_$Item_ID, level1 =labels_$Category_ID)
现在:
itemInfo(trans4)
labels level1
1 A001 A01
2 A002 A01
3 A003 A02
4 A005 A03
5 A004 A03
和:
str(trans4)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:11] 0 1 0 1 2 4 0 2 1 3 ...
.. .. ..@ p : int [1:6] 0 2 5 6 8 11
.. .. ..@ Dim : int [1:2] 5 5
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 5 obs. of 2 variables:
.. ..$ labels: Factor w/ 5 levels "A001","A002",..: 1 2 3 5 4
.. ..$ level1: Factor w/ 3 levels "A01","A02","A03": 1 1 2 3 3 # here we go!!!
..@ itemsetInfo:'data.frame': 5 obs. of 1 variable:
.. ..$ transactionID: chr [1:5] "T01" "T02" "T03" "T04" ...
有数据:
dats <- structure(list(Transaction_ID = structure(c(1L, 1L, 2L, 2L, 2L,
3L, 5L, 5L, 5L, 4L, 4L), .Label = c("T01", "T02", "T03", "T04",
"T05"), class = "factor"), Item_ID = structure(c(1L, 2L, 1L,
3L, 2L, 5L, 4L, 2L, 5L, 1L, 3L), .Label = c("A001", "A002", "A003",
"A004", "A005"), class = "factor"), Category_ID = structure(c(1L,
1L, 1L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 2L), .Label = c("A01", "A02",
"A03"), class = "factor")), class = "data.frame", row.names = c(NA,
-11L))
推荐阅读
- python - 转换成数据框
- python - 如何在不运行变量声明之外的所有代码的情况下从 Python 中的另一个文件访问变量?
- swift - firebase firestore 不接受来自 swift 编码的 json 的写入函数
- javascript - POST 变量永远不会被设置
- c++ - std::cin.read() 无法读取流
- python - 如何在两个 JSON 对象中查找多个匹配的键/值对
- recursion - ldapdelete: 无效选项 -- 'r'
- linux - cmake 查找系统包含目录,而不是按名称或内容...?(c/c++ 相对路径包括从错误的目录开始,因为符号链接)
- android - kivy:启动器在 android 11 中不起作用?
- python - 为什么“大写”在终端中不起作用