r - Select a random row in case of ties in a grouped df
问题描述
I have a data frame like below
df <- data.frame(group_var = c("a", "a", "b", "b"),
summ_var = c("x", "y", "z", "w"),
val = c(100, 100, 150, 200))
df
group_var summ_var val
1 a x 100
2 a y 100
3 b z 150
4 b w 200
For each group_var
, I want to select exactly one summ_var
with minimum val
.
I have tried the following code:
df %>%
group_by(group_var) %>%
filter(val == min(val)) %>%
ungroup()
group_var summ_var val
<fct> <fct> <dbl>
1 a x 100
2 a y 100
3 b z 150
which gives me multiple summ_var
for group_var = a
, since val == min(val)
is TRUE
for multiple values of summ_var
. How do I randomly select one of the multiple values of summ_var
for group_var = a
?
My desired output looks like below in which a random value of summ_var
is picked in each group in case of conflict.
group_var summ_var val
<fct> <fct> <dbl>
1 a x 100
2 b z 150
This is just a reproducible example, in reality I may have more than 2 conflicting values. Therefore, looking for a generalised approach. Any help is appreciated.
解决方案
With dplyr
, you can do:
df %>%
group_by(group_var) %>%
slice(which.min(rank(val, ties.method = "random")))
group_var summ_var val
<fct> <fct> <dbl>
1 a x 100
2 b z 150
Or:
df %>%
group_by(group_var) %>%
filter(val == min(val)) %>%
sample_frac(1) %>%
slice(1)
推荐阅读
- elixir - 在 Elixir 中最后一个斜杠后删除字符串
- javascript - Node.js 脚本在本地工作,但在 firebase 服务上不能在本地工作?
- reactjs - 我如何在反应原生钩子中使背景颜色动态
- python - 在 python 中可视化数据
- c# - 没有代理的 HttpWebRequest 很慢
- corda - 使用 ZipEntry 重复附件异常 Corda
- python - Django外键字段分配一个字段与另一个字段
- python - 使用 Numpy 基于向量从 3D 矩阵中选择列
- python - 将postgresql与python一起使用时,如何将我作为参数放入搜索函数的数据值的所有行值返回
- django - 我应该如何在 VirtualHost for Windows 服务器中添加 WSGIPython 路径?