首页 > 解决方案 > 从一列名称和描述中非随机选择

问题描述

我有一个data.frame (df)包含projects基于混凝土、桥梁或水(有些包括多个术语)的学生列,每个项目都由 a 分配,professor但我需要为该项目选择第二个标记。因此,我想根据他们的两个项目都包含“混凝土”、“桥梁”或“水”等词来匹配教授的名字。

条件: 没有人可以标记自己的项目。教授的出现次数可能与第二个标记一样多。

df<-data.frame(professor=c("Hellen", "Ben","Ethel", "Jim","Connor", "Juan","Lucy"), project=c("Bridges with stone", "Waterways","Concrete with steel","Structure of concrete bridges","Public health and water","Masonry of bridges","3D concrete"))

一个潜在的解决方案如下所示:

data.frame(professor=c("Hellen", "Ben","Ethel", "Jim","Connor", "Juan","Lucy"), project=c("Bridges with stone", "Waterways","Concrete with steel","Structure of concrete bridges","Public health and water","Masonry of bridges","3D concrete"),second_Marker=c("Juan","Connor","Jim","Lucy","Ben","Hellen","Ethel"))

在此处输入图像描述

标签: rdplyr

解决方案


这是执行此操作的一种方法-

keyword使用 从项目中提取值str_extract,为每个keyword获取一个不是教授姓名的随机匹配项。

library(tidyverse)

df %>%
  group_by(keyword = tolower(str_extract(project, 
             regex('(concrete|bridges|water)', ignore_case = TRUE)))) %>%
  mutate(second_Marker = map_chr(professor, 
                         ~sample(setdiff(professor, .x), 1))) %>%
  ungroup

#  professor project                       keyword  second_Marker
#  <chr>     <chr>                         <chr>    <chr>        
#1 Hellen    Bridges with stone            bridges  Juan         
#2 Ben       Waterways                     water    Connor       
#3 Ethel     Concrete with steel           concrete Jim          
#4 Jim       Structure of concrete bridges concrete Lucy         
#5 Connor    Public health and water       water    Ben          
#6 Juan      Masonry of bridges            bridges  Hellen       
#7 Lucy      3D concrete                   concrete Ethel        

推荐阅读