r - 创建一个包含计算字段的新表，其中包含来自两个不同表的数据

问题描述

我必须数据表：

第一个表格显示了导师的姓名和分配给他们每个人的所有学生。

    mentor          student_name
    Dr. Brown       Michael
    Dr. Brown       Diana
    Dr. Brown       Peter
    Dr. Brown       Christopher
    Dr. Brown       Stacy
    Ms. Lindblom    Rose
    Ms. Lindblom    Anne
    Ms. Lindblom    Steven
    Ms. Lindblom    Gloria
    Mr. Apple       Juan
    Mr. Apple       Francis
    Mr. Apple       David
    Mr. Apple       Sonja
    Mr. Apple       Dakota
    Mr. Apple       Latoya
    Mr. Apple       Avril
    Mr. Apple       James
    Mr. Apple       Stewart
    Mr. Apple       Sophia

第二个表格显示了导师和他们的学生之间的一对一辅导课程

 mentor         date_of_tutoring    student_name
 Dr. Brown      07/14/2019          Peter
 Dr. Brown      07/15/2019          Christopher
 Ms. Lindblom   06/28/2019          Gloria
 Mr. Apple      06/20/2019          Sophia
 Mr. Apple      06/22/2019          Latoya
 Mr. Apple      06/25/2019          Juan
 Mr. Apple      06/26/2019          Avril

在学年期间，每位导师都需要与每位学生进行一次指导会议。

我想创建一个新表来显示指导任务完成的百分比。当每位导师与分配给他们的所有学生进行一对一的辅导课程时，他们将完成任务 (100%)。

例如，根据表 2 的数据，以及分配给每个导师的学生人数，我想创建一个像这样的新表：

 teacher           %_mentoring_completed
 Dr. Brown          40%
 Ms. Lindblom       25%
 Mr. Apple          40%

标签： r

解决方案

一个选项是与按“导师”分组的两个数据集进行连接，获得mean非 NA 逻辑向量

library(dplyr)
library(stringr)
left_join(df1, df2) %>% 
   group_by(mentor) %>% 
    summarise(PercentageMentoringCompleted = str_c(100 * 
             mean(!is.na(date_of_tutoring)), "%"))
# A tibble: 3 x 2
#  mentor       PercentageMentoringCompleted
#  <chr>        <chr>                       
#1 Dr. Brown    40%                         
#2 Mr. Apple    40%                         
#3 Ms. Lindblom 25%

或另一种选择是使用count

library(purrr)
list(df2, df1) %>% map(~ .x %>% 
           dplyr::count(mentor)) %>% 
           reduce(inner_join, by = 'mentor') %>%
           transmute(mentor, perc = 100 * n.x/n.y)

或者base R使用mergeandaggregate

aggregate(PercentageMentoringCompleted ~ mentor,
  transform(merge(df1, df2, all.x = TRUE), 
       PercentageMentoringCompleted = !is.na(date_of_tutoring)), mean)

数据

df1 <- structure(list(mentor = c("Dr. Brown", "Dr. Brown", "Dr. Brown", 
"Dr. Brown", "Dr. Brown", "Ms. Lindblom", "Ms. Lindblom", "Ms. Lindblom", 
"Ms. Lindblom", "Mr. Apple", "Mr. Apple", "Mr. Apple", "Mr. Apple", 
"Mr. Apple", "Mr. Apple", "Mr. Apple", "Mr. Apple", "Mr. Apple", 
"Mr. Apple"), student_name = c("Michael", "Diana", "Peter", "Christopher", 
"Stacy", "Rose", "Anne", "Steven", "Gloria", "Juan", "Francis", 
"David", "Sonja", "Dakota", "Latoya", "Avril", "James", "Stewart", 
"Sophia")), class = "data.frame", row.names = c(NA, -19L))

df2 <- structure(list(mentor = c("Dr. Brown", "Dr. Brown", "Ms. Lindblom", 
"Mr. Apple", "Mr. Apple", "Mr. Apple", "Mr. Apple"), 
 date_of_tutoring = c("07/14/2019", 
"07/15/2019", "06/28/2019", "06/20/2019", "06/22/2019", "06/25/2019", 
"06/26/2019"), student_name = c("Peter", "Christopher", "Gloria", 
"Sophia", "Latoya", "Juan", "Avril")), class = "data.frame", row.names = c(NA, 
-7L))

r - 创建一个包含计算字段的新表，其中包含来自两个不同表的数据

问题描述

解决方案

数据

推荐阅读