首页 > 解决方案 > dplyr::summarise 根据另一列 max 拉取值

问题描述

基于以下可重现的代码,如何根据以下Address条件有条件地添加列max(LeastNEmployees)

dat_url <- "https://gender-pay-gap.service.gov.uk/viewing/download-data/2019"
dat <- read_csv(dat_url)

#2 convert EmployerSize
df = data.frame(EmployerSize=c('Less than 250','250 to 499', '500 to 999', '1000 to 4999', '5000 to 19,999', '20,000 or more'),
               LeastNEmployees = c(1,250,500, 1000, 5000, 20000))

a <- dat %>% 
   left_join(df, c('EmployerSize' = 'EmployerSize')) %>% 
   group_by(ResponsiblePerson) %>% 
   summarize(
     across(where(is.numeric) & !starts_with("Least"), mean),
     across(c("EmployerName","SicCodes"), ~toString(.x)),
     LeastNEmployees = max(LeastNEmployees))
     

标签: rdplyr

解决方案


这是which有条件的。

a <- dat %>% 
  left_join(df, c('EmployerSize' = 'EmployerSize')) %>% 
  group_by(ResponsiblePerson) %>% 
  summarize(
    across(where(is.numeric) & !starts_with("Least"), mean),
    across(c("EmployerName","SicCodes"), ~toString(.x)),
    LeastNEmployees = max(LeastNEmployees),
    Address = Address[which(LeastNEmployees == max(LeastNEmployees))])

推荐阅读