首页 > 解决方案 > 在单独的表 R 中查找和计数字符串出现到新列

问题描述

我有两个数据框。以下是示例,但应易于重现以进行说明。

df1 <- data.frame(School = c("Omaha South", "Omaha Central", "Grand Island"), 
                  Enrollment = c(2166, 2051, 1982))
df2 <- data.frame('Away Score' = c(25, 57, 76), 
                  'Away Team' = c("Omaha South", "Omaha Central", "Grand Island"),
                  'Away Score' = c(52, 88, 69), 
                  'Away Team' = c("Omaha Central", "Grand Island", "Omaha South"),                 
                  Date = c("1/11/2020", "1/12/2020", "1/13/2020"),
                  Winner = c("Omaha Central", "Grand Island", "Grand Island"),
                  Loser = c("Omaha South", "Omaha Central", "Omaha South"))

我的目标是在 df1 中创建一个名为“Wins”的新列,该列在 df1 中查找学校,然后计算该学校在 df2 的“Winner”列中列出的次数。

所以希望 df1 看起来像这样:

df1 <- data.frame(School = c("Omaha South", "Omaha Central", "Grand Island"), 
                  Enrollment = c(2166, 2051, 1982),
                  Wins = c(0, 1, 2))

我尝试了许多解决方案都无济于事,包括 sqldf。我的最新尝试如下,但它给了我一个错误,说没有适用于“group_by_”的方法应用于“NULL”类的对象

df$Wins %>%
     group_by(df2$Winner) %>%
     mutate(Count=n())

标签: rcountdplyrsqldf

解决方案


使用dplyr和连接的一种方式:

library(dplyr)

df1 %>%
  left_join(df2, by = c('School' = 'Winner')) %>%
  na.omit() %>%
  count(School, name = "wins") %>%
  right_join(df1) %>%
  mutate(wins = replace(wins, is.na(wins), 0))

使用基数 R,我们使用 计算获胜的频率,使用table将其转换为数据帧stack,然后转换mergedf1

merge(df1, stack(table(factor(df2$Winner, levels = df1$School))), 
           by.x = 'School', by.y = "ind")

推荐阅读