首页 > 解决方案 > 我如何从表格中计算平均费率?

问题描述

计算一个表,但计算的比率是 1999-2001 年的。只保留从 1999 年到 2001 年的行,其中球员有 100 次或更多的板出现,计算每个球员每个赛季的单打率和 BB 率,然后计算这三个赛季每个球员的平均单打率 (mean_singles) 和平均 BB 率 (mean_bb)。

在 1999 年至 2001 年期间,有多少球员的单次平均单次平均数大于 0.2?

library(tidyverse) 
library(Lahman)  

bat_02 <- Batting %>% filter(yearID %in% c("1999","2000","2001")) %>%
    mutate(pa = AB + BB, singles = (H - X2B - X3B - HR)/pa, bb = BB/pa) %>%
    filter(pa >= 100) %>%
    select(playerID, singles, bb)
        
bat_02 <- bat_02 %>% filter(singles > .2)
nrow(bat_02)

我已经过滤了表格,使其包含在 1999-2001 年出现 100 个或更多盘子的玩家。我用条件过滤了单打行:单打大于 0.2。以下代码给了我 133 的输出,这是不正确的。我的代码有什么错误吗?

标签: r

解决方案


这是我对这个问题的看法。

library(Lahman)
library(dplyr)

str(Batting)

Batting %>% 
  #Compute a table but with rates computed over 1999-2001.
  filter(yearID %in% c("1999","2000","2001")) %>%

  #Keep only rows from 1999-2001 where players have 100 or more plate appearances
  mutate(pa = AB + BB) %>%
  filter(pa >= 100) %>%

  #calculate each player's single rate and BB rate per season
  group_by(playerID, yearID) %>%
  summarise(singles = (H - X2B - X3B - HR)/pa, bb = BB/pa) %>%

  #then calculate the average single rate (mean_singles) and average BB rate (mean_bb) per player over those three seasons.
  group_by(yearID) %>%
  summarise(mean_single=mean(singles), mean_bb=mean(bb))

# A tibble: 3 x 3
  yearID mean_single mean_bb
   <int>       <dbl>   <dbl>
1   1999       0.137  0.0780
2   2000       0.140  0.0765
3   2001       0.132  0.0634

或者问题可能只需要总体费率:

  #then calculate the average single rate (mean_singles) and average BB rate (mean_bb) per player over those three seasons.
  ungroup() %>%
  summarise(mean_single=mean(singles, na.rm=TRUE), mean_bb=mean(bb, na.rm=TRUE))
# A tibble: 1 x 2
  mean_single mean_bb
        <dbl>   <dbl>
1       0.136  0.0726

推荐阅读