r - 有条件地将数据框中的值替换为另一个数据框中的值
问题描述
我有一个从 excel 表导入的主表,它定义了我的分析应该进行的聚合级别。它定义了药物是否应汇总到药物名称或分开以包括剂型和强度。
这是我使用的主表的示例
df1 <- data.frame(Drug = c("Drug A","Drug B","Drug B","Drug B","Drug C","Drug C","Drug C"),
Strength = c("All","All","All","All","All","All","All"),
Dosage_Form = c("All","CAP","SOLN","INJ","CAP","INJ","SOLN"), stringsAsFactors = FALSE)
Drug Strength Dosage Form
Drug A All All
Drug B All CAPS
Drug B All SOLN
Drug B All INJ
Drug C All CAP
Drug C All INJ
Drug C All SOLN
这告诉我,药物 A 的销售不应细分为包括各种强度,而相反,药物 B 和 C 应该对这两种剂型有单独的行。
我的销售数据来自数据仓库,数据是在药物、强度和剂型级别提取的,此数据的示例如下:
df2 <- data.frame(Drug = c("Drug A","Drug A","Drug A","Drug B","Drug B","Drug B","Drug C","Drug C","Drug C"),
STRENGTH = c("80 MCG","80 MCG","80 MCG","80 MCG","80 MCG","80 MCG","80 MCG","80 MCG","80 MCG"),
DOSAGE_FORM = c("SOLN","CAP","INJ","CAP","INJ","SOLN","CAP","INJ","SOLN"),
UNITS_SOLD = c(60,100,300,50,20,10,40,20,80), stringsAsFactors = FALSE)
数据如下:
Drug Strength Dosage_Form Units
Drug A 80 MCG SOLN 60
Drug A 80 MCG CAP 100
Drug A 80 MCG INJ 300
Drug B 80 MCG CAP 50
Drug B 80 MCG INJ 20
Drug B 80 MCG SOLN 10
Drug C 80 MCG CAP 40
Drug C 80 MCG INJ 20
Drug C 80 MCG SOLN 80
我正在寻找一种可以实现以下伪代码的方法:
For each drug in df1
replace the strength and dosage form in df2
with the respective values from df1
My final result should look as follows:
Drug Strength Dosage_Form Units
Drug A All All 60
Drug A All All 100
Drug A All All 300
Drug B All CAP 50
Drug B All INJ 20
Drug B All SOLN 10
Drug C All CAP 40
Drug C All INJ 20
Drug C All SOLN 80
Once I have the strengths and dosage forms aligned with the master sheet I can then aggregate the data to the correct levels using group by and summarise
解决方案
We can do a join and do the replacement
library(data.table)
setDT(df2)[, STRENGTH := "All"]
df2[setDT(df1)[Strength == Dosage_Form],
DOSAGE_FORM := Dosage_Form, on = .(Drug)]
df2
# Drug STRENGTH DOSAGE_FORM UNITS_SOLD
#1: Drug A All All 60
#2: Drug A All All 100
#3: Drug A All All 300
#4: Drug B All CAP 50
#5: Drug B All INJ 20
#6: Drug B All SOLN 10
#7: Drug C All CAP 40
#8: Drug C All INJ 20
#9: Drug C All SOLN 80
Or using base R
df1s <- subset(df1, Strength == Dosage_Form)
i1 <- df2$Drug %in%df1s$Drug
df2$STRENGTH <- "All"
df2$DOSAGE_FORM[i1] <- df1s$Dosage_Form
推荐阅读
- postgresql - (类别列中的项目顺序不正确)编写 SQL 查询以生成前 5 个类别的排名列表
- python - KeyError:“['Hours', 'Game'] 不在索引中”
- npm - 停止调试器后如何让vscode同时停止nodemon和npm?
- javascript - 如何在 ag-grid 的页脚中显示特定的列总和
- node.js - 为什么MongoDB创建一个空文档?
- java - 无法将更新的对象值从客户端发送到服务器,服务器读取不存在的值
- c# - 如何在编辑模式下防止gridview行选择。(asp.net服务器端)
- angular - Angular 中的 tns-core-modules 突然模块解决错误
- python - 在 Airflow 中将文件上传到 GCS 存储桶
- c++ - c++ 在抛出'std::out_of_range' std::vector 的实例后调用终止