r - 来自不同data.frame的样本
问题描述
data1=data.frame("Group1" = sample(1:2,100,r=T),
"Group2" = sample(c('a','b'),100,r=T),
"V1" = sample(1:3, 100, r=T),
"V2" = sample(0:1, 100, r=T),
"V3" = sample(1:5, 100, r=T),
"V4" = sample(1:2, 100, r=T))
data2=data.frame("Group1"=c(1,1,2,2),
"Group2"=c('a','b','a','b'),
"Size"=c(9,7,6,10),
"V1"=c(NA),
"V2"=c(NA),
"V3"=c(NA),
"V4"=c(NA))
我有包含我的数据的“data1”。然后我有'data2',它有'Group1'和'Group2'和'Size'。我想要的是按('Group1'和'Group2')对我的数据进行分组,并从'data1'中随机抽取一个大小为'Size'的样本来填充data2中的V1-V4。
有希望的输出看起来像这样,但是根据“data1”填充了 NA 值
library(dplyr);library(tidyr)
data3= data2 %>%
uncount(Size)
解决方案
library(data.table)
setDT(data1)
setDT(data2)
# sample indices from each group
i <-
data2[data1, on = .(Group1, Group2)
][, .(i_samp = sample(.I, Size)), by = .(Group1, Group2, Size)
][, i_samp]
# subset to sampled indices
merge(data1[i], data2[, .(Group1, Group2, Size)])
# Group1 Group2 V1 V2 V3 V4 Size
# 1: 1 a 3 1 2 2 9
# 2: 1 a 3 1 5 1 9
# 3: 1 a 2 1 4 2 9
# 4: 1 a 3 1 1 1 9
# 5: 1 a 3 1 4 1 9
# 6: 1 a 1 0 3 1 9
# 7: 1 a 3 1 1 1 9
# 8: 1 a 1 1 1 2 9
# 9: 1 a 2 0 2 1 9
# 10: 1 b 2 0 5 2 7
# 11: 1 b 3 0 5 2 7
# 12: 1 b 3 1 4 2 7
# 13: 1 b 1 1 1 1 7
# 14: 1 b 1 1 4 1 7
# 15: 1 b 1 0 1 1 7
# 16: 1 b 1 0 3 1 7
# 17: 2 a 2 0 5 1 6
# 18: 2 a 1 0 5 1 6
# 19: 2 a 3 1 1 2 6
# 20: 2 a 1 0 2 1 6
# 21: 2 a 3 1 1 2 6
# 22: 2 a 1 1 3 2 6
# 23: 2 b 3 0 2 1 10
# 24: 2 b 2 1 5 1 10
# 25: 2 b 3 0 1 1 10
# 26: 2 b 3 1 2 1 10
# 27: 2 b 2 0 5 1 10
# 28: 2 b 2 0 2 1 10
# 29: 2 b 2 0 2 2 10
# 30: 2 b 1 0 1 1 10
# 31: 2 b 3 0 5 1 10
# 32: 2 b 3 0 5 1 10
# Group1 Group2 V1 V2 V3 V4 Size
使用的输入数据:
data1=data.frame("Group1" = sample(1:2,100,r=T),
"Group2" = sample(c('a','b'),100,r=T),
"V1" = sample(1:3, 100, r=T),
"V2" = sample(0:1, 100, r=T),
"V3" = sample(1:5, 100, r=T),
"V4" = sample(1:2, 100, r=T))
data2=data.frame("Group1"=c(1,1,2,2),
"Group2"=c('a','b','a','b'),
"Size"=c(9,7,6,10),
"V1"=c(NA),
"V2"=c(NA),
"V3"=c(NA),
"V4"=c(NA))
这是一个参数化程度更高的版本,您可以在其中显式设置要填充的列以及连接两个表的键
fill_key <- c('Group1', 'Group2')
columns_to_fill <- paste0('V', 1:4)
# sample indices from each group
i <-
data2[data1, on = (fill_key)
][, .(i_samp = sample(.I, Size)), by = c(fill_key, 'Size')
][, i_samp]
# subset to sampled indices
merge(data1[i, c(fill_key, columns_to_fill), with = FALSE],
data2[, c(fill_key, 'Size'), with = FALSE])
推荐阅读
- json - 有没有一种优雅的方法可以将 BQ 嵌套字段转换为 key:value JSON?
- pandas - Xarray 日期时间到序数
- c - 如何使用 Makefile 在一个目录中编译多个单独的 C 文件?
- python-3.x - Pyomo 使用非集合组件作为索引集
- android - 无法安装颤振的安卓许可证
- spring-boot - spring boot jpa open-in-view false。如何将现有应用程序转换为不使用 OSIV?
- javascript - 缩放时如何在 d3 折线图上显示更多刻度
- android - (android) 动态嵌套回收器视图项随机变化的问题
- android - 先前定义的 BLE 服务即使在被“删除”后仍然存在
- javascript - 鼠标在定义区域时控制视频播放