r - 如果它们在其中一列中有匹配的数字,如何从多个 (3) 数据帧中提取行并在第四个中输出它们?
问题描述
我正在尝试从三个数据帧batters_16、batters_17 和batters_18 中提取行——如下所示
player_id player_name launch_speed launch_angle
1 443558 Nelson Cruz 94.4 11.1
2 519317 Giancarlo Stanton 93.8 14.0
3 408234 Miguel Cabrera 93.6 12.3
4 452095 Tyler Flowers 93.2 12.9
5 407812 Matt Holliday 93.0 8.3
6 120074 David Ortiz 92.8 16.6
我想根据他们的 player_id 是否出现在所有 3 年(帧)中,恰好出现在两个帧中(batters_18 和batters_16 但不是batters_17),以及最后一个如果它们只出现,将它们分类到单独的数据帧中在三个框架之一。这应该给我 7 个总数据框。我怎样才能完成这项工作?我编写了一个函数,尝试使用 %in% 将它们分开,然后运行计算,但没有运气让它工作——输出只有 3 列,几乎是随机数,我会经常收到类似的错误下面的那个。
Warning message:
In if (playerid %in% b18$player_id == FALSE & playerid %in%
b17$player_id == : the condition has length > 1 and only the first
element will be used
这是我写的供参考的函数。
# to combine batting stats from the 3 seasons in the appropriate categories
# but with a weighting of 45% in 2018, 35% in 2017, and 20% in 2016 for sake
# of favoring recent form and performance, but in each seasons all players have
# at least 50 events
combine.batting.stats <- function(b18, b17, b16, playerID_map){
#using the stats for each year along with the player ID map
b18 = read.csv("~/HITS/batters_18.csv")
b17 = read.csv("~/HITS/batters_17.csv")
b16 = read.csv("~/HITS/batters_17.csv")
playerID_map = read.csv("~/HITS/playerID_map.csv")
playerid = playerID_map$MLBID
average_launch_speed = 0
average_launch_angle = 0
# so first my weights with the scenarios being
# exists in all 3 years, exits in exactly two, and finally exists exactly one
# the check for whether something is in a data frame is as below
# SOMETHING %in% DATAFRAME$COLUMN
# this should be used to code three different scenarios where I weight
# the value of season stats depending on how may seasons they qualify in
if(playerid %in% b18$player_id == TRUE & playerid %in% b17$player_id == TRUE
& playerid %in% b16$player_id == TRUE) {
#calculation for case of 3 year player
# 18 is 45%, 17 is 35%, and 16 is 20%
average_launch_speed = (((b18$launch_speed * 0.45) + (b17$launch_speed * 0.35)
+ (b16$launch_speed * 0.2)) / 3)
average_launch_angle = (((b18$launch_angle * 0.45) + (b17$launch_angle * 0.35)
+ (b16$launch_angle * 0.2)) / 3)
}
if(playerid %in% b18$player_id == TRUE & playerid %in% b17$player_id == TRUE
& playerid %in% b16$player_id == FALSE) {
#calculation for player in b18 and b17 but not b16....should be extended to
#other 2 year player situations that is b17 and b16 but not b18 as well as
#b18 and b16 but not b17 (which I would like to skew even more to b18 stats)
#than players who have played the most recent 2 years to reflect potential
#post injury change
average_launch_speed = (((b18$launch_speed * 0.6) + (b17$launch_speed * 0.4))
/ 2)
average_launch_angle = (((b18$launch_angle * 0.6) + (b17$launch_angle * 0.4))
/ 2)
}
if(playerid %in% b18$player_id == TRUE & playerid %in% b17$player_id == FALSE & playerid %in% b16$player_id == TRUE) {
#in b18 and b16 but not b17
average_launch_speed = (((b18$launch_speed * 0.6) + (b16$launch_speed * 0.4))
/ 2)
average_launch_angle = (((b18$launch_angle * 0.6) + (b16$launch_angle * 0.4))
/ 2)
}
if(playerid %in% b18$player_id == FALSE & playerid %in% b17$player_id == TRUE
& playerid %in% b16$player_id == TRUE) {
#in b17 and b16 but not b18
average_launch_speed = (((b17$launch_speed * 0.6) + (b16$launch_speed * 0.4))
/ 2)
average_launch_angle = (((b17$launch_angle * 0.6) + (b16$launch_angle * 0.4))
/ 2)
}
# next are those in only one single frame/year
# this one is only in 18
if(playerid %in% b18$player_id == TRUE & playerid %in% b17$player_id == FALSE
& playerid %in% b16$player_id == FALSE){
average_launch_speed = b18$launch_speed
average_launch_angle = b18$launch_angle
}
# only in b17
if(playerid %in% b18$player_id == FALSE & playerid %in% b17$player_id == TRUE
& playerid %in% b16$player_id == FALSE){
average_launch_speed = b17$launch_speed
average_launch_angle = b17$launch_angle
}
#only in b16
if(playerid %in% b18$player_id == FALSE & playerid %in% b17$player_id == FALSE
& playerid %in% b16$player_id == TRUE){
average_launch_speed = b16$launch_speed
average_launch_angle = b16$launch_angle
}
combined_stats = list(playerid, average_launch_speed, average_launch_angle)
# returning a data frame from the function
write.csv(combined_stats, "combined_stats_1.csv", col.names = TRUE, row.names = FALSE)
}
解决方案
让我们首先将所有数据集组合成一个整洁的数据集:
batters_16$year<-2016
batters_17$year<-2017
batters_18$year<-2018
batters<-rbind(batters_16,batters_17,batters_18)
现在很容易使用 `dplyr' 做你想做的事:
batters<- batters %>% group_by(player_id)
filter(batters,any(year==2016) & all(year!=2017 & year!=2018)) # only 2016
filter(batters,any(year==2016) & any(year==2017) & all(year!=2018)) # only 2016 and 2017
etc...
推荐阅读
- c# - 在另一个程序集中为类添加 KnownType
- java - 基于第二列的 2d int 数组排序
- sql - 如何在存储过程中传递动态列名
- ios - 使用错误的列名更新/插入表 - iOS
- python - Python中的数据导入错误
- java - Gradle JavaExec 抱怨无法加载或找到主类
- r - 通过类似于 rbind 的引用将 data.table 附加到另一个 data.table
- java - Spring连接池在物理层面是如何工作的?
- php - 如何在 Limesurvey Docker 映像上安装 PHP Composer
- ansible - Ansible 从主机创建 var