r - 创建一个包含计算字段的新表,其中包含来自两个不同表的数据
问题描述
我必须数据表:
第一个表格显示了导师的姓名和分配给他们每个人的所有学生。
mentor student_name
Dr. Brown Michael
Dr. Brown Diana
Dr. Brown Peter
Dr. Brown Christopher
Dr. Brown Stacy
Ms. Lindblom Rose
Ms. Lindblom Anne
Ms. Lindblom Steven
Ms. Lindblom Gloria
Mr. Apple Juan
Mr. Apple Francis
Mr. Apple David
Mr. Apple Sonja
Mr. Apple Dakota
Mr. Apple Latoya
Mr. Apple Avril
Mr. Apple James
Mr. Apple Stewart
Mr. Apple Sophia
第二个表格显示了导师和他们的学生之间的一对一辅导课程
mentor date_of_tutoring student_name
Dr. Brown 07/14/2019 Peter
Dr. Brown 07/15/2019 Christopher
Ms. Lindblom 06/28/2019 Gloria
Mr. Apple 06/20/2019 Sophia
Mr. Apple 06/22/2019 Latoya
Mr. Apple 06/25/2019 Juan
Mr. Apple 06/26/2019 Avril
在学年期间,每位导师都需要与每位学生进行一次指导会议。
我想创建一个新表来显示指导任务完成的百分比。当每位导师与分配给他们的所有学生进行一对一的辅导课程时,他们将完成任务 (100%)。
例如,根据表 2 的数据,以及分配给每个导师的学生人数,我想创建一个像这样的新表:
teacher %_mentoring_completed
Dr. Brown 40%
Ms. Lindblom 25%
Mr. Apple 40%
解决方案
一个选项是与按“导师”分组的两个数据集进行连接,获得mean
非 NA 逻辑向量
library(dplyr)
library(stringr)
left_join(df1, df2) %>%
group_by(mentor) %>%
summarise(PercentageMentoringCompleted = str_c(100 *
mean(!is.na(date_of_tutoring)), "%"))
# A tibble: 3 x 2
# mentor PercentageMentoringCompleted
# <chr> <chr>
#1 Dr. Brown 40%
#2 Mr. Apple 40%
#3 Ms. Lindblom 25%
或另一种选择是使用count
library(purrr)
list(df2, df1) %>% map(~ .x %>%
dplyr::count(mentor)) %>%
reduce(inner_join, by = 'mentor') %>%
transmute(mentor, perc = 100 * n.x/n.y)
或者base R
使用merge
andaggregate
aggregate(PercentageMentoringCompleted ~ mentor,
transform(merge(df1, df2, all.x = TRUE),
PercentageMentoringCompleted = !is.na(date_of_tutoring)), mean)
数据
df1 <- structure(list(mentor = c("Dr. Brown", "Dr. Brown", "Dr. Brown",
"Dr. Brown", "Dr. Brown", "Ms. Lindblom", "Ms. Lindblom", "Ms. Lindblom",
"Ms. Lindblom", "Mr. Apple", "Mr. Apple", "Mr. Apple", "Mr. Apple",
"Mr. Apple", "Mr. Apple", "Mr. Apple", "Mr. Apple", "Mr. Apple",
"Mr. Apple"), student_name = c("Michael", "Diana", "Peter", "Christopher",
"Stacy", "Rose", "Anne", "Steven", "Gloria", "Juan", "Francis",
"David", "Sonja", "Dakota", "Latoya", "Avril", "James", "Stewart",
"Sophia")), class = "data.frame", row.names = c(NA, -19L))
df2 <- structure(list(mentor = c("Dr. Brown", "Dr. Brown", "Ms. Lindblom",
"Mr. Apple", "Mr. Apple", "Mr. Apple", "Mr. Apple"),
date_of_tutoring = c("07/14/2019",
"07/15/2019", "06/28/2019", "06/20/2019", "06/22/2019", "06/25/2019",
"06/26/2019"), student_name = c("Peter", "Christopher", "Gloria",
"Sophia", "Latoya", "Juan", "Avril")), class = "data.frame", row.names = c(NA,
-7L))
推荐阅读
- javascript - 以角度填充数据下拉onclick表格行(基于表格行数据)
- mongodb - MongoDB Redash - FieldPath 字段名称可能不包含“。”
- sql - 在 Laravel 6 中使用 sql 查询统计每个部门的工人数
- ios - 在选项卡之间移动时如何保留推送视图?
- powershell - 如何通过 PowerShell 或 Graph API 在 Micosoft Teams 中设置成员设置“允许成员上传自定义应用程序”
- lstm - 带手套的 Bi-LSTM - 词形还原问题
- python - 如何基于包含时间戳的 csv 文件名加载数据帧
- r - 从去年到给定年份的外推行可用
- python - Python从字符串中的开头到第一个特定字符删除?
- php - 使用 PHP/MySQLi 查询返回错误结果的 Ajax 调用