首页 > 解决方案 > 如何过滤行(事务中的更正) - R

问题描述

我有一个交易数据集(UCI 机器学习存储库中的在线零售)。在数据集中,我们有一些交易(1 个交易 = 1 行)及其更正(具有不同 InvoiceCode 和负数量的同一行)。

例子:

交易

我试过了:

Transactions <- data3[duplicated(data3[,c(data3$StockCode, data3$Description, data3$UnitPrice data3$CustomerID, data3$Country)])]

但是代码没有像我预期的那样工作。谁能建议我怎么做?谢谢你。

链接到原始数据集:http: //archive.ics.uci.edu/ml/datasets/online+retail

标签: rdata-cleaning

解决方案


如果您想过滤掉所有更正,那么您应该删除所有值为 的记录Quantity < 0

df <- data.frame(InvoiceNo=c("C551685","551697"),
                 StockCode=c("POST","POST"),
                 Description=c("POSTAGE","POSTAGE"),
                 Quantity=c(-1,1),
                 InvoiceDate=c("5/3/2011 12:51","5/3/2011 13:46"),
                 UnitPrice=c(8142.75,8142.75),
                 CustomerID=c(16029,16029),
                 Country=c("United Kingdom", "United Kingdom"), 
                 stringsAsFactors=F)

df
#  InvoiceNo StockCode Description Quantity    InvoiceDate UnitPrice CustomerID        Country
#1   C551685      POST     POSTAGE       -1 5/3/2011 12:51   8142.75      16029 United Kingdom
#2    551697      POST     POSTAGE        1 5/3/2011 13:46   8142.75      16029 United Kingdom

#Filter out all corrections:
df[df$Quantity < 0, ]
#  InvoiceNo StockCode Description Quantity    InvoiceDate UnitPrice CustomerID        Country
#1   C551685      POST     POSTAGE       -1 5/3/2011 12:51   8142.75      16029 United Kingdom

如果要删除所有交易,则:

#Filter out all transactions:
df[df$Quantity > 0, ]
#  InvoiceNo StockCode Description Quantity    InvoiceDate UnitPrice CustomerID        Country
#2    551697      POST     POSTAGE        1 5/3/2011 13:46   8142.75      16029 United Kingdom

推荐阅读