首页 > 解决方案 > 有条件地将数据框中的值替换为另一个数据框中的值

问题描述

我有一个从 excel 表导入的主表,它定义了我的分析应该进行的聚合级别。它定义了药物是否应汇总到药物名称或分开以包括剂型和强度。

这是我使用的主表的示例

df1 <- data.frame(Drug = c("Drug A","Drug B","Drug B","Drug B","Drug C","Drug C","Drug C"),
              Strength = c("All","All","All","All","All","All","All"),
              Dosage_Form = c("All","CAP","SOLN","INJ","CAP","INJ","SOLN"), stringsAsFactors = FALSE)

Drug    Strength   Dosage Form
Drug A  All        All
Drug B  All        CAPS
Drug B  All        SOLN
Drug B  All        INJ
Drug C  All        CAP
Drug C  All        INJ
Drug C  All        SOLN

这告诉我,药物 A 的销售不应细分为包括各种强度,而相反,药物 B 和 C 应该对这两种剂型有单独的行。

我的销售数据来自数据仓库,数据是在药物、强度和剂型级别提取的,此数据的示例如下:

    df2 <- data.frame(Drug = c("Drug A","Drug A","Drug A","Drug B","Drug B","Drug B","Drug C","Drug C","Drug C"),
              STRENGTH = c("80 MCG","80 MCG","80 MCG","80 MCG","80 MCG","80 MCG","80 MCG","80 MCG","80 MCG"),
              DOSAGE_FORM = c("SOLN","CAP","INJ","CAP","INJ","SOLN","CAP","INJ","SOLN"),
              UNITS_SOLD = c(60,100,300,50,20,10,40,20,80), stringsAsFactors = FALSE)

数据如下:

Drug   Strength Dosage_Form Units
Drug A  80 MCG  SOLN         60
Drug A  80 MCG  CAP         100
Drug A  80 MCG  INJ         300
Drug B  80 MCG  CAP          50
Drug B  80 MCG  INJ          20
Drug B  80 MCG  SOLN         10
Drug C  80 MCG  CAP          40
Drug C  80 MCG  INJ          20
Drug C  80 MCG  SOLN         80

我正在寻找一种可以实现以下伪代码的方法:

For each drug in df1 replace the strength and dosage form in df2 with the respective values from df1

My final result should look as follows:

Drug   Strength Dosage_Form Units
Drug A  All     All          60
Drug A  All     All         100
Drug A  All     All         300
Drug B  All     CAP          50
Drug B  All     INJ          20
Drug B  All     SOLN         10
Drug C  All     CAP          40
Drug C  All     INJ          20
Drug C  All     SOLN         80

Once I have the strengths and dosage forms aligned with the master sheet I can then aggregate the data to the correct levels using group by and summarise

标签: rdplyr

解决方案


We can do a join and do the replacement

library(data.table)
setDT(df2)[, STRENGTH := "All"]
df2[setDT(df1)[Strength == Dosage_Form], 
          DOSAGE_FORM :=  Dosage_Form, on = .(Drug)]

df2
#      Drug STRENGTH DOSAGE_FORM UNITS_SOLD
#1: Drug A      All         All         60
#2: Drug A      All         All        100
#3: Drug A      All         All        300
#4: Drug B      All         CAP         50
#5: Drug B      All         INJ         20
#6: Drug B      All        SOLN         10
#7: Drug C      All         CAP         40
#8: Drug C      All         INJ         20
#9: Drug C      All        SOLN         80

Or using base R

df1s <- subset(df1, Strength == Dosage_Form)
i1 <- df2$Drug %in%df1s$Drug
df2$STRENGTH <- "All"
df2$DOSAGE_FORM[i1] <- df1s$Dosage_Form
  

推荐阅读