首页 > 解决方案 > R Function involving two for loops - baseball data

问题描述

For those into sports, I am working on a function that adds a column with the pitch count for a game in a given season for a pitcher.

For example's sake, data used is a data frame called pitcher that contains a game_date and sv_id (date/timestamp or the pitch). My goal is to order the sv_id in ascending order for each unique game_date and then add a column with a numbering system for this order. So for example, if for game_date=9/9/2018 there were 3 pitches thrown with sv_id's equal to 090918_031456, 090918_031613, and 090918_031534, I would first want to sort this data into chronological order (090918_031456,090918_031534,090918_031613) and then have a new column with the values 1,2,3 respectively to act as a pitch count. Below is my function so far. I originally thought I would make a list of lists but now I am not sure that is the right way to go about this. Please help! This is also my first time posting on here so any advice is appreciated. Thank you!!!

 `     pitchCount <- function(game_date, sv_id){
       gameUnique<-unique(pitcher$game_date)
       PC<-list()
       for (j in 1:length(gameUnique)){
         PCLocal<-filter(pitcher,game_date==gameUnique[j])
         PCLocal[order(PCLocal$sv_id),]
         for (i in 1:length(PCLocal$sv_id)){
         PCLocal$PC[i]=i
        }
       PC[j]=PCLocal$PC
       }
      return(PC)
      }

      pitch.Count <- pitchCount(pitcher$game_date,pitcher$sv_id)
      pitcher$PC<-pitch.Count
 ` 

标签: rfor-loop

解决方案


我不确定您的数据如何,但我从您的描述中假设以下内容

> df
# A tibble: 9 x 2
  game_date  sv_id        
  <date>     <chr>        
1 2018-09-09 090918_031456
2 2018-09-09 090918_031613
3 2018-09-09 090918_031534
4 2018-05-17 090918_031156
5 2018-05-17 090918_031213
6 2018-06-30 090918_031177
7 2018-06-30 090918_031211
8 2018-06-30 090918_031144
9 2018-06-30 090918_031203

然后你dplyr用来生成你的目标

library(dplyr)
df <- df %>% 
    group_by(game_date) %>% 
    mutate(count = n_distinct(sv_id)) %>% #count sv_id with each game_date
    arrange(desc(sv_id))

输出是:

# A tibble: 9 x 3
# Groups:   game_date [3]
  game_date  sv_id         count
  <date>     <chr>         <int>
1 2018-06-30 090918_031144     4
2 2018-05-17 090918_031156     2
3 2018-06-30 090918_031177     4
4 2018-06-30 090918_031203     4
5 2018-06-30 090918_031211     4
6 2018-05-17 090918_031213     2
7 2018-09-09 090918_031456     3
8 2018-09-09 090918_031534     3
9 2018-09-09 090918_031613     3

我希望这会有所帮助


推荐阅读