首页 > 解决方案 > R - 在重复的 ID 和索引中有条件地选择

问题描述

我有一个具有重复 ID 和不同变量的数据框,如下所示:

x <- 1:10
ID <- c(20,20,55,55,45,45,45,45,45,45)
fruit <- c("Orange", "Apple", "Pear", "Apple", "Blueberries", "Apple", "Banana", "Banana", "Strawberry", "Pear")
df <- cbind(x, ID, fruit)

> df
X   ID   fruit
1   20   Orange
2   20   Apple
3   20   Pear
4   55   Apple
5   55   Blueberries
6   45   Apple
7   45   Banana
8   45   Banana
9   45   Strawberry
10  45   Pear

我需要根据重复 ID 中的层次结构(例如 Orange > Blueberries > Pear > Banana > Apple > Strawberry)有条件地索引某些属性获取:

X   ID   fruit
1   20   Orange
5   55   Blueberries
10  45   Pear

真的,我对如何做到这一点没有好的/简单的想法。有什么想法吗?

标签: r

解决方案


我们arrange根据'OP's post中指定的'ID'、'fruit'levels和'降序'顺序的'X',然后按'ID'分组,slice第一行

library(dplyr)
df %>% 
  arrange(ID, factor(fruit, levels = c('Orange', 'Blueberries', 'Pear', 
             'Banana','Apple', 'Strawberry')), desc(X)) %>% 
  group_by(ID) %>% 
  slice(1)
# A tibble: 3 x 3
# Groups:   ID [3]
#      X    ID fruit      
#  <int> <int> <chr>      
#1     1    20 Orange     
#2    10    45 Pear       
#3     5    55 Blueberries

数据

df <- structure(list(X = 1:10, ID = c(20L, 20L, 20L, 55L, 55L, 45L, 
45L, 45L, 45L, 45L), fruit = c("Orange", "Apple", "Pear", "Apple", 
"Blueberries", "Apple", "Banana", "Banana", "Strawberry", "Pear"
 )), class = "data.frame", row.names = c(NA, -10L))

推荐阅读