首页 > 解决方案 > 根据先前行的类型删除行

问题描述

我正在尝试根据先前行的类型删除行。如果我的 data.frame 看起来像:

|日期 |时间 | 类型 | 毛额 | 发件人电子邮件 | 收件人邮箱 |
|2018.07.12 |12:45:13 | 网站支付 | 30 | aaa@customer.com | admin@site.com |
|2018.07.21 |16:19:34 | 网站支付 | 30 | bbb@customer.com | admin@site.com |
|2018.07.22 |18:21:17 | 付款退款 | -30 | admin@site.com | bbb@custom.com |
|2018.07.24 |07:10:00 | 网站支付 | 30 | bbb@customer.com | admin@site.com |
|2018.08.17 |15:17:40 | 网站支付 | 30 | ccc@custom.com | admin@site.com |

我想删除已退款的交易。

|日期 |时间 | 类型 | 毛额 | 发件人电子邮件 | 收件人邮箱 |
|2018.07.12 |12:45:13 | 网站支付 | 30 | aaa@customer.com | admin@site.com |
|2018.07.24 |07:10:00 | 网站支付 | 30 | bbb@customer.com | admin@site.com |
|2018.08.17 |15:17:40 | 网站支付 | 30 | ccc@custom.com | admin@site.com |

任何帮助,将不胜感激!

标签: rdataframe

解决方案


我有一个简单的解决方案,它可能不够优雅和快速。在您的示例中,您可以先搜索退款发生的位置,然后查找退款的人,最后删除这些行。代码可能是这样的:

delete_refund=function(transaction_matrix){

  #find in which row refund happens
  index_refund=which(transaction_matrix[ , "Gross"]<0);

  #find who receive refund
  refunded=transaction_matrix[index_refund, "Receiver_email"];

  #for each one refunds, find what they purchase before refund
  all_refund_purchase=vector();
  for (row in index_refund) {
    one_purchase=which((transaction_matrix[1:row,"Gross"]==
      abs(transaction_matrix[row,"Gross"])) &                
      (transaction_matrix[1:row,"Sender_email"]==
      transaction_matrix[row,"Receiver_email"]));
    #one may buy several things at the same value and refund part of them, so length of one_purchase may be greater than 1
    one_purchase=one_purchase[!(one_purchase %in% all_refund_purchase)];
    #one may has many refunds, record those which haven't been captured in all_refund_purchase
    all_refund_purchase=c(all_refund_purchase, 
      one_purchase[length(one_purchase)])
    #when some one bought several things at the same value
  }

  return(transaction_matrix[c(-index_refund, -all_refund_purchase), ]);
}

由于缺乏数据样本,我在我创建的一个简单示例中对其进行了测试。

df=data.frame(date=1:4, Gross=c(30,30,-30,30), 
    Sender_email=c('bbb@customer.com','ccc@customer.com',
      'admin@site.com','bbb@customer.com'),
    Receiver_email=c('admin@site.com','admin@site.com',
      'bbb@customer.com','admin@site.com'), 
    stringsAsFactors = FALSE);

  date Gross     Sender_email   Receiver_email
1    1    30 bbb@customer.com   admin@site.com
2    2    30 ccc@customer.com   admin@site.com
3    3   -30   admin@site.com bbb@customer.com
4    4    30 bbb@customer.com   admin@site.com

结果是

  date Gross     Sender_email Receiver_email
2    2    30 ccc@customer.com admin@site.com
4    4    30 bbb@customer.com admin@site.com

这满足了海报的需求。


推荐阅读