r - Filtering transaction level data
问题描述
I am dealing with a data frame containing the transaction level data. It contains two fields, bill_id
and product
.
The data represents products purchased at a bill level, and a particular bill_id
gets repeated as many times as the number of products purchased in that bill. For example, if 5 items have been purchased in bill_id 12345, the data for this bill will be like this:
bill_id product
12345 A
12345 B
12345 C
12345 D
12345 E
My objective is to filter out data of all bills containing a certain product.
Following is an example of how I am performing this task currently:
library(dplyr)
set.seed(1)
# Sample data
dat <- data.frame(bill_id = sample(1:500, size = 1000, replace = TRUE),
product = sample(LETTERS, size = 1000, replace =
TRUE),
stringsAsFactors = FALSE) %>%
arrange(bill_id, product)
# vector of bill_ids of product A
bills_productA <- dat %>%
filter(product == "A") %>%
pull(bill_id) %>%
unique()
# data for bill_ids in vector bills_productA
dat_subset <- dat %>%
filter(bill_id %in% bills_productA)
This leads to the creation of an intermediary vector of bill_ids (bills_productA
) and a two-step filtering process (first find ids of bills containing the product, and then find all transactions of these bills).
Is there a more efficient way of performing this task?
解决方案
You can filter
the bill_id
by directly subsetting it
library(dplyr)
dat_subset1 <- dat %>% filter(bill_id %in% unique(bill_id[product == "A"]))
identical(dat_subset, dat_subset1)
#[1] TRUE
This would also work without unique
in it but better to keep the list short.
推荐阅读
- c# - 具有多个不同类型参数的 Dapper 更新查询
- python - 我正在尝试使用 python 从 aws 下载整个 s3 存储桶,但得到一个 OSError[error22]
- angular - 内容安全政策指令
- javascript - 服务人员什么时候应该自毁?
- reactjs - 有条件地将 prop 传递给 React 中的组件
- r - 无法在 R 中创建印度等值线
- java - 自定义订购 MSSQL 休眠 Spring Boot 2
- google-bigquery - BigQuery Avro 加载作业与 useAvroLogicalTypes 不正确的字段类型
- ios - SwiftUI:向控件添加效果会禁用交互性
- c# - 绑定:未找到属性。MVVM