首页 > 解决方案 > R:循环遍历一个数据帧中的一组值更新第二个数据帧

问题描述

更新为更现实的例子;这次在 interp_b 中添加了重复项。

我正在尝试interp_b使用第二个数据帧 ( ) 中的值填充一个数据帧 ( ) 中的字段bait。我想查看 中的每一行obs_datetimeinterp_b并确定在obs_datetime. 这稍后将用于计算每个obs_datetime. 诱饵时间bait在 column的数据框中bait_datetime。结果应该放在数据框中调用的字段latestbait_datetimeinterp_b

我正在可视化一个迭代过程,其中 interp_b "latestbait_datetime" 不断重新计算,直到到达诱饵数据框中的最后一行。我尝试的 for 循环显然是在行中运行并进行指定的计算,但我似乎无法以我想要的格式获得输出;它为每个循环生成输出,而不是重写和更新 interp_b 数据帧。

这是构建两个数据框的一些代码;interp_b 和 bait(请原谅我的不雅)

# interp_b dataframe----

   structure(list(plot_station_year = c("Cow_C2_2019", "RidingStable_C3_2018", 
"RidingStable_C3_2018", "Raf_C1_2018", "Metcalfe_C2_2019"), obs_datetime = structure(c(1559487600, 
1544954400, 1541084400, 1515160800, 1567756800), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), latestbait_datetime = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = c("POSIXct", 
"POSIXt"))), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))

在此处输入图像描述

# bait dataframe----

    structure(list(plot_station_year = c("Cow_C2_2019", "Cow_C2_2019", 
"RidingStable_C3_2018", "Raf_C1_2018"), bait_datetime = structure(c(1557500400, 
1559746800, 1543676400, 1491318000), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -4L), spec = structure(list(
    cols = list(plot_station_year = structure(list(), class = c("collector_character", 
    "collector")), bait_datetime = structure(list(format = "%d-%m-%Y %H:%M"), class = c("collector_datetime", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

在此处输入图像描述

期望的结果看起来像这样

在此处输入图像描述

下面是我的两个尝试。第一次导致数据帧只包含循环的最终运行,第二次尝试导致数据帧包含所有运行结果(如绑定所期望的那样)。

library(tidyverse)

#attempt #1----
    for (i in 1:nrow(bait)) { 

  print(paste("row =",i))

  interpbait <- interp_b %>% 
    mutate(latestbait_datetime = if_else((plot_station_year == bait$plot_station_year[i] & (obs_datetime >= bait$bait_datetime[i] & (is.na(latestbait_datetime) | latestbait_datetime < bait$bait_datetime[i]))), bait$bait_datetime[i], latestbait_datetime))

}


#attempt #2----
    resultb <- data.frame()

for (i in 1:nrow(bait)) { 

  print(paste("row =",i))

  interpbait2 <- interp_b %>% 
    mutate(latestbait_datetime = if_else((plot_station_year == bait$plot_station_year[i] & (obs_datetime >= bait$bait_datetime[i] & (is.na(latestbait_datetime) | latestbait_datetime < bait$bait_datetime[i]))), bait$bait_datetime[i], latestbait_datetime))

  resultb <- bind_rows(resultb, interpbait2)

  print(resultb)
}

任何帮助将不胜感激。

标签: rloopsdataframefor-loopiteration

解决方案


我不确定这需要多长时间,但这是一个 tidyverse 解决方案。对于 中的每一行interp_b,我们将数据框过滤为bait正确的plot_station_year,并确保所有日期时间都小于中的行interp_b。然后,我们按日期时间降序排列过滤后的bait数据(以便最近的日期在最前面)。我们对该数据帧的第一行进行切片,以便我们只获得最近的日期。然后我们从数据框中“拉出”日期时间,并将其添加到interp_b.

library(tidyverse)
library(progress) # for progress bar

# create progress bar to update, so that you can estimate the amount of time it will take to finish the entire loop
pb <- progress_bar$new(total = nrow(interp_b))

for (i in 1:nrow(interp_b)) {

  last_time_baited <- bait %>% 
    #filter bait dataframe to appropriate plot, station, year based on
    # the row in interp_b
    filter(plot_station_year == interp_b$plot_station_year[i],
           # ensure all datetimes are less than that row in interp_b
           bait_datetime < interp_b$obs_datetime[i]) %>% 
    # arrange by datetime (most recent datetimes first)
    arrange(desc(bait_datetime)) %>% 
    # take the top row - this will be the most recent date-time that
    # the plot-station was baited
    slice(1) %>% 
    # "pull" that value out of the dataframe so you have a value, 
    # not a tibble
    pull(bait_datetime) # 

  # update the row in interp_b with the date_time baited
  interp_b$latestbait_datetime[i] <- last_time_baited

  pb$tick() # print progress
}

结果表与您的预期输出 ( interp_b) 匹配:

# A tibble: 5 x 3
  plot_station_year    obs_datetime        latestbait_datetime
  <chr>                <dttm>              <dttm>             
1 Cow_C2_2019          2019-06-02 15:00:00 2019-05-10 11:00:00
2 RidingStable_C3_2018 2018-12-16 10:00:00 2018-12-01 10:00:00
3 RidingStable_C3_2018 2018-11-01 15:00:00 NA                 
4 Raf_C1_2018          2018-01-05 14:00:00 2017-04-04 11:00:00
5 Metcalfe_C2_2019     2019-09-06 08:00:00 NA  

推荐阅读