r - 如何根据三个不同变量的三个条件在df中选择一个值?
问题描述
我有一个如下数据框:
set.seed(123)
df <- data.frame(Delay=rep(-5:5, times=4, each=1),
ID= rep(c("A","B","C","D"), times=1, each=11),
variable=rep(c("R2","SE"), times=11, each=1),
value=sample(seq(0, 1, by=0.01), 44, replace=TRUE))
df$ID <- as.factor(df$ID)
df$variable <- as.factor(df$variable)
head(df)
Delay ID variable value
1 -5 A R2 0.30
2 -4 A SE 0.78
3 -3 A R2 0.50
4 -2 A SE 0.13
5 -1 A R2 0.66
6 0 A SE 0.41
我想获得,和具有最小值的值Delay
。ID=="B"
variable=="R2"
value
我怎么能找到这个值?
解决方案
该解决方案与 R 的版本无关,但结果(此处)对随机性很敏感(显然在 R-3.5.3 和 R-4.0.0 之间的某个地方发生了变化)。
R-3.5.3
with(df[order(df$value),], Delay[ID == "B" & variable == "R2"])
# [1] -2 0 2 -4 4
with(df[order(df$value),], Delay[ID == "B" & variable == "R2"][1])
# [1] -2
dput(df)
# structure(list(Delay = c(-5L, -4L, -3L, -2L, -1L, 0L, 1L, 2L, 3L, 4L, 5L, -5L, -4L, -3L, -2L, -1L, 0L, 1L, 2L, 3L, 4L, 5L, -5L, -4L, -3L, -2L, -1L, 0L, 1L, 2L, 3L, 4L, 5L, -5L, -4L, -3L, -2L, -1L, 0L, 1L, 2L, 3L, 4L, 5L), ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), variable = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), class = "factor", .Label = c("R2", "SE")), value = c(0.29, 0.79, 0.41, 0.89, 0.94, 0.04, 0.53, 0.9, 0.55, 0.46, 0.96, 0.45, 0.68, 0.57, 0.1, 0.9, 0.24, 0.04, 0.33, 0.96, 0.89, 0.69, 0.64, 1, 0.66, 0.71, 0.54, 0.6, 0.29, 0.14, 0.97, 0.91, 0.69, 0.8, 0.02, 0.48, 0.76, 0.21, 0.32, 0.23, 0.14, 0.41, 0.41, 0.37)), row.names = c(NA, -44L), class = "data.frame")
R-4.0.0
with(df[order(df$value),], Delay[ID == "B" & variable == "R2"])
# [1] 4 -4 -2 0 2
with(df[order(df$value),], Delay[ID == "B" & variable == "R2"][1])
# [1] 4
dput(df)
# structure(list(Delay = c(-5L, -4L, -3L, -2L, -1L, 0L, 1L, 2L, 3L, 4L, 5L, -5L, -4L, -3L, -2L, -1L, 0L, 1L, 2L, 3L, 4L, 5L, -5L, -4L, -3L, -2L, -1L, 0L, 1L, 2L, 3L, 4L, 5L, -5L, -4L, -3L, -2L, -1L, 0L, 1L, 2L, 3L, 4L, 5L), ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), variable = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("R2", "SE"), class = "factor"), value = c(0.3, 0.78, 0.5, 0.13, 0.66, 0.41, 0.49, 0.42, 1, 0.13, 0.24, 0.89, 0.9, 0.68, 0.9, 0.56, 0.91, 0.08, 0.92, 0.98, 0.71, 0.25, 0.06, 0.41, 0.08, 0.82, 0.35, 0.77, 0.8, 0.42, 0.75, 0.14, 0.31, 0.06, 0.08, 0.4, 0.73, 0.22, 0.26, 0.59, 0.52, 0.06, 0.52, 0.26)), row.names = c(NA, -44L), class = "data.frame")
他们不同的地方
数据的“随机性”对 R 版本很敏感。
如果你很好奇,左边的三个(非随机)列是相同的,只是value
列不同。结合两个df
s(并命名为 R 版本)呈现
df
# Delay ID variable R-3.5.3 R-4.0.0
# 1 -5 A R2 0.29 0.30
# 2 -4 A SE 0.79 0.78
# 3 -3 A R2 0.41 0.50
# 4 -2 A SE 0.89 0.13
# 5 -1 A R2 0.94 0.66
# 6 0 A SE 0.04 0.41
# 7 1 A R2 0.53 0.49
# 8 2 A SE 0.90 0.42
# 9 3 A R2 0.55 1.00
# 10 4 A SE 0.46 0.13
# 11 5 A R2 0.96 0.24
# 12 -5 B SE 0.45 0.89
# 13 -4 B R2 0.68 0.90
# 14 -3 B SE 0.57 0.68
# 15 -2 B R2 0.10 0.90
# 16 -1 B SE 0.90 0.56
# 17 0 B R2 0.24 0.91
# 18 1 B SE 0.04 0.08
# 19 2 B R2 0.33 0.92
# 20 3 B SE 0.96 0.98
# 21 4 B R2 0.89 0.71
# 22 5 B SE 0.69 0.25
# 23 -5 C R2 0.64 0.06
# 24 -4 C SE 1.00 0.41
# 25 -3 C R2 0.66 0.08
# 26 -2 C SE 0.71 0.82
# 27 -1 C R2 0.54 0.35
# 28 0 C SE 0.60 0.77
# 29 1 C R2 0.29 0.80
# 30 2 C SE 0.14 0.42
# 31 3 C R2 0.97 0.75
# 32 4 C SE 0.91 0.14
# 33 5 C R2 0.69 0.31
# 34 -5 D SE 0.80 0.06
# 35 -4 D R2 0.02 0.08
# 36 -3 D SE 0.48 0.40
# 37 -2 D R2 0.76 0.73
# 38 -1 D SE 0.21 0.22
# 39 0 D R2 0.32 0.26
# 40 1 D SE 0.23 0.59
# 41 2 D R2 0.14 0.52
# 42 3 D SE 0.41 0.06
# 43 4 D R2 0.41 0.52
# 44 5 D SE 0.37 0.26
为什么他们不同
正如@KonradRudolph 建议的那样,这在 R_3.6 中发生了变化,其中(我正在阅读此内容):
* The default method for generating from a discrete uniform
distribution (used in sample(), for instance) has been changed.
This addresses the fact, pointed out by Ottoboni and Stark, that
the previous method made sample() noticeably non-uniform on large
populations. See PR#17494 for a discussion. The previous method
can be requested using RNGkind() or RNGversion() if necessary for
reproduction of old results. Thanks to Duncan Murdoch for
contributing the patch and Gabe Becker for further assistance.
The output of RNGkind() has been changed to also return the
'kind' used by sample().
(来源:https ://stat.ethz.ch/pipermail/r-announce/2019/000641.html和https://cran.r-project.org/doc/manuals/r-release/NEWS.3.html )
推荐阅读
- json - Flutter 断言失败:布尔表达式不能为空
- string - RichTextBox 行等于一个字符串
- javascript - 遍历多个对象并在新对象中获取相似的键值
- c# - Unity 3D - 围绕朝前的圆周移动对象
- python - Python 如何支持随机选择的变量名?
- java - 有没有办法将系统属性 java.locale.providers=HOST 限制为日期和时间
- mysql - 查询运行时间过长
- javascript - 其他类不存在时的Jquery addclass
- neo4j - 使用 Neo4j 进行接触者追踪 - 有些人未被识别
- django - 为什么 csfr 豁免装饰器不起作用?