首页 > 解决方案 > Find values in data frame and return them in new column in data frame

问题描述

data frame is 12 columns id, season, week, season_type, start_date, home_team, home_points, away_team, away_points, Line, home_cover, away_cover

        id season  week season_type start_date home_team        home_points away_team             away_points  Line home_cover away_cover
      <dbl>  <dbl> <dbl> <chr>            <dbl> <chr>                  <dbl> <chr>                       <dbl> <dbl> <chr>      <chr>     
1 400603840   2015     1 regular         42251. South Carolina            17 North Carolina                 13  -3.5 Y          N         
2 400763593   2015     1 regular         42251. UCF                       14 Florida International          15 -17   N          Y         
3 400763399   2015     1 regular         42251. Central Michigan          13 Oklahoma State                 24  20.5 Y          N         
4 400603839   2015     1 regular         42251  Vanderbilt                12 Western Kentucky               14 -17.5 N          Y         
5 400756883   2015     1 regular         42251. Utah                      24 Michigan                       17  -3   Y          N         
6 400763398   2015     1 regular         42251. Minnesota                 17 TCU                            23  16   Y          N         

what want is to find what team each away_team and home_team played the week before and cannot figure it out for life of me

标签: rdataframefunctionloops

解决方案


我会建议dplyr这样做,因为我认为更容易理解流程。

但是,首先,您的示例数据不包括多个星期,因此不足以证明这一点。这是一个小样本数据(这部分不用担心理解,它只是为了创建假数据):

library(dplyr)
set.seed(42)
dat <- bind_rows(lapply(1:4, function(w) data.frame(season=2021, week=w, home_team=sample(LETTERS[1:4]), away_team=sample(LETTERS[5:8]))))
dat
#    season week home_team away_team
# 1    2021    1         A         F
# 2    2021    1         D         H
# 3    2021    1         C         G
# 4    2021    1         B         E
# 5    2021    2         D         H
# 6    2021    2         C         E
# 7    2021    2         B         G
# 8    2021    2         A         F
# 9    2021    3         D         G
# 10   2021    3         B         E
# 11   2021    3         C         H
# 12   2021    3         A         F
# 13   2021    4         D         E
# 14   2021    4         A         F
# 15   2021    4         C         G
# 16   2021    4         B         H

从这里开始,我们将 (1) 每周转移数据,然后 (2) “加入”两次。(有关合并/连接概念的良好阅读,请参阅如何连接(合并)数据帧(内、外、左、右)INNER JOIN、LEFT JOIN、RIGHT JOIN 和 FULL JOIN 之间有什么区别?

shifted <- dat %>%
  transmute(week = week + 1, home_prevaway = home_team, away_prevaway = away_team)
left_join(dat, shifted, by = c("week", "home_team" = "home_prevaway")) %>%
  left_join(., shifted, by = c("week", "away_team" = "away_prevaway"))
#    season week home_team away_team away_prevaway home_prevaway
# 1    2021    1         A         F          <NA>          <NA>
# 2    2021    1         D         H          <NA>          <NA>
# 3    2021    1         C         G          <NA>          <NA>
# 4    2021    1         B         E          <NA>          <NA>
# 5    2021    2         D         H             H             D
# 6    2021    2         C         E             G             B
# 7    2021    2         B         G             E             C
# 8    2021    2         A         F             F             A
# 9    2021    3         D         G             H             B
# 10   2021    3         B         E             G             C
# 11   2021    3         C         H             E             D
# 12   2021    3         A         F             F             A
# 13   2021    4         D         E             G             B
# 14   2021    4         A         F             F             A
# 15   2021    4         C         G             H             D
# 16   2021    4         B         H             E             C

home_prevaway是当前主队上一场比赛的球队;同样对于away_prevaway.

为了验证,在第 1 周,“B”队(主场)对阵“E”队(客场),“D”队对阵“H”队。在第 2 周,Bhome_prevaway是“E”,Dhome_prevaway是“H”。(考虑到随机数据,A 队和 D 队连续两周在客场比赛相同的事实是一个不方便的巧合。)


推荐阅读