r - 从一列名称和描述中非随机选择
问题描述
我有一个data.frame (df)
包含projects
基于混凝土、桥梁或水(有些包括多个术语)的学生列,每个项目都由 a 分配,professor
但我需要为该项目选择第二个标记。因此,我想根据他们的两个项目都包含“混凝土”、“桥梁”或“水”等词来匹配教授的名字。
条件: 没有人可以标记自己的项目。教授的出现次数可能与第二个标记一样多。
df<-data.frame(professor=c("Hellen", "Ben","Ethel", "Jim","Connor", "Juan","Lucy"), project=c("Bridges with stone", "Waterways","Concrete with steel","Structure of concrete bridges","Public health and water","Masonry of bridges","3D concrete"))
一个潜在的解决方案如下所示:
data.frame(professor=c("Hellen", "Ben","Ethel", "Jim","Connor", "Juan","Lucy"), project=c("Bridges with stone", "Waterways","Concrete with steel","Structure of concrete bridges","Public health and water","Masonry of bridges","3D concrete"),second_Marker=c("Juan","Connor","Jim","Lucy","Ben","Hellen","Ethel"))
解决方案
这是执行此操作的一种方法-
keyword
使用 从项目中提取值str_extract
,为每个keyword
获取一个不是教授姓名的随机匹配项。
library(tidyverse)
df %>%
group_by(keyword = tolower(str_extract(project,
regex('(concrete|bridges|water)', ignore_case = TRUE)))) %>%
mutate(second_Marker = map_chr(professor,
~sample(setdiff(professor, .x), 1))) %>%
ungroup
# professor project keyword second_Marker
# <chr> <chr> <chr> <chr>
#1 Hellen Bridges with stone bridges Juan
#2 Ben Waterways water Connor
#3 Ethel Concrete with steel concrete Jim
#4 Jim Structure of concrete bridges concrete Lucy
#5 Connor Public health and water water Ben
#6 Juan Masonry of bridges bridges Hellen
#7 Lucy 3D concrete concrete Ethel
推荐阅读
- python-3.x - 翻译 gettext 范围之外的字符串
- csv - 递归地将同名的 .CSV 文件合并到一个文件中
- r - 使用 LETTERS[1:3] 作为新特征在不同长度的 df 上循环创建虚拟数据
- python - 循环遍历表行并收集数据
- postgresql - 将golang切片直接插入postgres数组
- sql - SQL 在特定日期之后选择记录,将 NULL 与日期进行比较
- awk - 如何重新格式化 fasta 文件头的字段并折叠序列?
- python - 在 Python 上运行 AppleScript:“语法错误:预期行尾,但发现脚本结尾。”
- sql - sql拆分SQL SERVER 2019中逗号的列
- asp.net-mvc-5 - 收到 Stripe Webhook 后将项目保存在数据库中