r - 根据列值对行进行分组并在 R 中保留具有最小值的行
问题描述
在下面的数据集中,我想首先检查该列的哪些行U
并且D
具有相同的值。然后,对于这样一组具有U
和作为相同值的行,我想保留对 columns和V
具有最小值的行。对于我拥有的数据,这三个将始终具有与 where和match的行组中的同一行的最小值。Mean
Min
Max
U
V
我尝试group()
了函数,但它没有像我想要的那样产生输出。请提出任何有效的方法。
输入数据
data <- structure(list(A = c(0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18,
0.18, NA, NA, NA, NA, NA, NA), B = c(0.33, 0.33, 0.33, 0.33,
0.33, 0.33, 0.33, 0.33, 1, 2, 2, 2, 3, 4), C = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Yes", class = "factor"),
U = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("ABC-001", "PQR-001"), class = "factor"),
D = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("ABC", "PQR"), class = "factor"),
E = structure(c(1L, 2L, 3L, 4L, 4L, 5L, 5L, 6L, 1L, 1L, 2L,
2L, 3L, 3L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"),
F = c(22000014L, 22000031L, 22000033L, 22000025L, 22000028L,
22000020L, 22000021L, 22000015L, 11100076L, 11200076L, 11100077L,
11200077L, 11100078L, 11200078L), G = c(0, 0, 0, 0, 0, 0,
0, 0, -0.1, -0.1, -0.1, -0.1, 0.2, 0.2), H = c(100, 100,
100, 100, 100, 100, 100, 100, 1.2, 1.2, 1.2, 1.2, 0.9, 0.9
), I = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("us", "V"), class = "factor"),
Mean = c(38.72, 37.52111111, 38.44166667, 39.23666667, 39.35888889,
38.96, 38.95333333, 38.41777778, 0.691707061, 0.691554561,
0.691516833, 0.691423506, 0.763736, 0.764015761), Min = c(34.05,
33.25, 33.31, 35.14, 33.91, 33.78, 33.78, 33.75, 0.6911166,
0.6908743, 0.6908813, 0.6907286, 0.7609318, 0.7616949), Max = c(43.83,
42.12, 43.57, 44.03, 44.88, 44.03, 44.02, 43.52, 0.692533,
0.6922278, 0.6923681, 0.6919283, 0.7674736, 0.7668633)), class = "data.frame", row.names = c(NA,
-14L))
预期输出
output <- read.table(header = TRUE, text = " A B C U D E F G H I Mean Min Max
+ 0.18 0.33 Yes ABC-001 ABC B 22000031 0 100 us 37.52111111 33.25 42.12
+ NA 2 Yes PQR-001 PQR B 11200077 -0.1 1.2 V 0.691423506 0.6907286 0.6919283
+ ")
解决方案
您可以与order
基地duplicated
核对R
data = data[order(data$Mean),]
output = data[!duplicated(data[c("U","D")]),]
output
A B C U D E F G H I Mean Min Max
12 NA 2.00 Yes PQR-001 PQR B 11200077 -0.1 1.2 V 0.6914235 0.6907286 0.6919283
2 0.18 0.33 Yes ABC-001 ABC B 22000031 0.0 100.0 us 37.5211111 33.2500000 42.1200000
如果你想dplyr
library(dplyr)
data %>% group_by(U, D) %>% slice(which.min(Mean))
推荐阅读
- python - 这些警告是什么?如何解决这个问题?
- python - 使用python从文本或c文件中删除启用/禁用的代码段
- regex - 替换上下文中字符的所有实例
- vega-lite - 带有预分箱数据的网格
- java - 添加、编辑、删除、搜索和排序客户积分列表的 Java 程序
- javascript - 如何在下拉Angular Material上添加一个清晰的图标?
- python - 在句子中查找单词并按顺序排列
- javascript - 命令行中的 Laravel NPM 错误“npm run watch”
- javascript - 对齐一个简单的列表,保持圆形
- python - 在 MAC OS 上安装 Python 3.9.1 但仍然说 Python 版本是 2.7