首页 > 解决方案 > R:在 2 个不同的行和列上查找重复和减法

问题描述

我在下面有一个数据集,我需要减去要退款的溢价(溢价 - 退款)。来自 2 个重复的 (Product_Code) 行。我将从具有“新”状态的行中获得保费,然后退款将来自“已取消”状态。请在下面查看我的数据集。

Product_Code <- c("1A","1B","1D","1A","1C","1D","1F","1G","1B","1H")
Status <- c("New", "New","New","Canceled","New","Canceled","New","New",
            "Canceled", "New")
Premium <- c(1200,1500,2000,0,1000,0,1400,1600,0,1300)
Refund <- c(0,0,0,800,0,1500,0,0,900,0)
DataSet <- data.frame(Product_Code, Status, Premium, Refund).
> DataSet
   Product_Code   Status Premium Refund
1            1A      New    1200      0
2            1B      New    1500      0
3            1D      New    2000      0
4            1A Canceled       0    800
5            1C      New    1000      0
6            1D Canceled       0   1500
7            1F      New    1400      0
8            1G      New    1600      0
9            1B Canceled       0    900
10           1H      New    1300      0

我想要的结果是创建一个新数据集。如果 Product_Code 已重复(有新状态和已取消状态),则“新”状态的溢价将从“已取消”状态的退款中扣除。然后折叠到只有一个 Product_Code 删除 Canceled 行。新的保费将是保费(来自新状态) - 退款(来自已取消状态)。请参阅下面的所需输出。

> DataSet
  Product_Code Status Premium Refund
1           1A    New     400      0
2           1B    New     600      0
3           1D    New     500      0
4           1C    New    1000      0
5           1F    New    1400      0
6           1G    New    1600      0
7           1H    New    1300      0

标签: r

解决方案


使用dplyr,我们可以将其sum PremiumRefund列相减。

library(dplyr)

DataSet %>%
  group_by(Product_Code) %>%
  summarise(Status = "New", 
            Premium = sum(Premium) - sum(Refund), 
            Refund = 0)

# A tibble: 7 x 4
#  Product_Code Status Premium Refund
#  <fct>        <chr>    <dbl>  <dbl>
#1 1A           New        400      0
#2 1B           New        600      0
#3 1C           New       1000      0
#4 1D           New        500      0
#5 1F           New       1400      0
#6 1G           New       1600      0
#7 1H           New       1300      0

我们也可以使用相同的data.table逻辑

library(data.table)

setDT(DataSet)
DataSet[,.(Premium = sum(Premium) - sum(Refund), Status = "New", 
           Refund = 0), Product_Code]

并在基础 R 中使用aggregate

transform(aggregate(Premium~Product_Code, transform(DataSet, 
          Premium = Premium - Refund), sum), Status = "New", Refund = 0)

推荐阅读