首页 > 解决方案 > 在特定事件发生之前拉出每一行的某种方法

问题描述

我很难弄清楚如何在发生之前从数据框中提取所有行。我知道我可以很容易地拉出第一次出现的东西,但是对于我的具体情况,我需要在指定之前拉出所有情况。这也有点棘手,因为它们被分成客户,所以必须根据每个客户。一些客户也有多个积极的事件。

到目前为止,我几乎可以使用 for 循环来实现这一点,但是有一些不正确的行被拉出。

本质上,我有两个数据集,一个阳性病例只出现一次,另一个经常出现。

正例仅出现一次的数据集。

data.frame(ID = c("103", "103", "103", "103", "103", "107", "107", "107", "107", "107"), 
           Items = c("638,193,1937,193", "3918,38327,1938,200", "860", "3982,392,3019,3928", "4038,291,493,029,192", "3604,1361,453,2782", "117", "860", "291", "203"), 
           rank = c(1,2,3,4,5,1,2,3,4,5), 
           Ordercount = c(0,0,1,0,0,0,0,1,0,0))

阳性案例不止一次出现的数据集。

data.frame(ID = c("103", "103", "103", "103", "103", "107", "107", "107", "107", "107"), 
           Items = c("638,193,1937,193", "3918,38327,1938,200", "860", "3982,392,3019,3928", "4038,291,493,029,192", "3604,1361,453,2782", "117", "860", "291", "203"), 
           rank = c(1,2,3,4,5,1,2,3,4,5), 
           Ordercount = c(0,0,1,0,1,0,0,0,1,1))

期望的输出

#First Case
data.frame(ID = c("103", "103","107", "107"), 
           Items = c("638,193,1937,193", "3918,38327,1938,200","3604,1361,453,2782", "117"), 
           rank = c(1,2,1,2), 
           Ordercount = c(0,0,0,0))
ID               Items rank Ordercount
1 103    638,193,1937,193    1          0
2 103 3918,38327,1938,200    2          0
3 107  3604,1361,453,2782    1          0
4 107                 117    2          0
# Second Case

data.frame(ID = c("103", "103","107", "107", "107"), 
           Items = c("638,193,1937,193", "3918,38327,1938,200","3604,1361,453,2782", "117", "860"), 
           rank = c(1,2,1,2,3), 
           Ordercount = c(0,0,0,0,0))

  ID               Items rank Ordercount
1 103    638,193,1937,193    1          0
2 103 3918,38327,1938,200    2          0
3 107  3604,1361,453,2782    1          0
4 107                 117    2          0
5 107                 860    3          0

标签: r

解决方案


您可以使用match查找 的第一次出现Ordercount==1。要在每个ID组中找到它,您可以使用ave如下:

x[as.logical(ave(x$Ordercount, x$ID, FUN=function(x) seq_along(x) < match(1, x))),]

#First Case
#   ID               Items rank Ordercount
#1 103    638,193,1937,193    1          0
#2 103 3918,38327,1938,200    2          0
#6 107  3604,1361,453,2782    1          0
#7 107                 117    2          0

# Second Case
#   ID               Items rank Ordercount
#1 103    638,193,1937,193    1          0
#2 103 3918,38327,1938,200    2          0
#6 107  3604,1361,453,2782    1          0
#7 107                 117    2          0
#8 107                 860    3          0

推荐阅读