首页 > 解决方案 > 线性插值以填充不大于 2 天的间隙的 NoData

问题描述

我正在使用列表(2014-2018 年每年的数据帧)列表(测量深度)的嵌套列表(地面传感器),我想对每个数据帧执行线性插值。这是数据集的概述,以便您了解它的外观:

str(G1OUT_gwFERN) 

$ SE13 :List of 3
  ..$ d20:List of 5
  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
  .. .. ..$ SWC : num [1:8760] 46 45.9 46 45.9 45.9 ...
  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
  .. .. ..$ SWC : num [1:8760] 49.8 49.8 49.8 49.8 49.8 ...
  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
  .. .. ..$ SWC : num [1:8784] 48.2 48.2 48.1 48.1 48.1 ...
  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  ..$ d50:List of 5
  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
  .. .. ..$ SWC : num [1:8760] 35.2 35.2 35.2 35.2 35.2 ...
  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
  .. .. ..$ SWC : num [1:8760] 34.8 34.8 34.7 34.7 34.8 ...
  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
  .. .. ..$ SWC : num [1:8784] 34.2 34.2 34.1 34.1 34.1 ...
  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
  .. .. ..$ SWC : num [1:8760] 36.4 36.4 36.3 36.3 36.3 ...
  ..$ d5 :List of 5
  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
  .. .. ..$ SWC : num [1:8760] 32.5 32.4 32.4 32.4 32.4 ...
  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
  .. .. ..$ SWC : num [1:8760] 32.1 32.1 32.1 32.1 32.1 ...
  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
  .. .. ..$ SWC : num [1:8784] 30.3 30.3 30.3 30.2 30.2 ...
  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
  .. .. ..$ SWC : num [1:8760] 31.1 31.2 31.1 31.1 31.1 ...
 $ SE14 :List of 3
  ..$ d20:List of 5
  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
  .. .. ..$ SWC : num [1:8760] 52.5 52.5 52.5 52.5 52.4 ...
  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
  .. .. ..$ SWC : num [1:8760] 53.7 53.7 53.7 53.7 53.7 ...
  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
  .. .. ..$ SWC : num [1:8784] 52.3 52.2 52.3 52.3 52.2 ...
  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
  .. .. ..$ SWC : num [1:8760] 55 55 55 55.1 55 ...
  ..$ d50:List of 5
  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
  .. .. ..$ SWC : num [1:8760] 27.9 27.9 27.9 27.9 27.9 ...
  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
  .. .. ..$ SWC : num [1:8760] 28.5 28.5 28.5 28.5 28.5 ...
  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
  .. .. ..$ SWC : num [1:8784] 26.7 26.7 26.7 26.6 26.7 ...
  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
  .. .. ..$ SWC : num [1:8760] 29.4 29.4 29.4 29.4 29.5 ...
  ..$ d5 :List of 5
  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
  .. .. ..$ SWC : num [1:8760] 39.8 39.8 39.7 39.6 39.7 ...
  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
  .. .. ..$ SWC : num [1:8760] 42.2 42.3 42.3 42.3 42.3 ...
  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
  .. .. ..$ SWC : num [1:8784] 36.6 36.6 36.5 36.6 36.5 ...
  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
  .. .. ..$ SWC : num [1:8760] 56.5 56.5 56.5 56.5 56.3 ...

我从列表中提取了其中一个数据框的一部分并使用了dput(),因此您可以使用一些玩具数据:

    toydat <-  structure(list(Date = structure(c(16277, 16277, 16277, 16277, 
16277, 16277, 16277, 16277, 16277, 16277, 16277, 16277, 16277, 
16277, 16277, 16277, 16277, 16277, 16277, 16278, 16278, 16278, 
16278, 16278, 16278, 16278, 16278, 16278, 16278, 16278, 16278, 
16278, 16278, 16278, 16278, 16278, 16278, 16278, 16278, 16278, 
16278, 16278, 16278, 16279, 16279, 16279, 16279, 16279, 16279, 
16279, 16279, 16279, 16279, 16279, 16279, 16279, 16279, 16279, 
16279, 16279, 16279, 16279, 16279, 16279, 16279, 16279, 16279, 
16280, 16280, 16280, 16280, 16280, 16280, 16280, 16280, 16280, 
16280, 16280, 16280, 16280, 16280, 16280, 16280, 16280, 16280, 
16280, 16280, 16280, 16280, 16280, 16280, 16281, 16281, 16281, 
16281, 16281, 16281, 16281, 16281, 16281, 16281), class = "Date"), 
    SWC = c(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 19.627243, 19.543659, 
    19.593796, 19.534379, 19.59937, 19.51582, 19.482441, 19.51582, 
    19.571497, 19.645825, 20.83435, 21.116572, 22.688702, 22.216629, 
    21.54243, 21.229946, 21.003335, 20.833735, 20.74902, 20.608045, 
    20.512311, 20.411049)), row.names = 48774:48874, class = "data.frame")

测量是每小时进行一次,所以我一天有 24 次测量。数据帧中的一些值是 NoData 值,所以我想使用线性插值来填补这些空白。但是,如果 NoData 值和实际值之间的差距不大于 2 天,我只想使用线性插值。关于玩具数据,这意味着如果缺少 7 月 28 日和 29 日(2014-07-28 和 2014-07-29)的值,我只想填补这些天的空白,而不是 27 日, 26日、25日……7月等等。如果差距大于 2 天,我想保留 NoData 值,因为稍后我将使用线性回归来填补这些差距,但这不应该是本文的主题。

我已经尝试过以下事情:

我使用了包中的na.approx()功能zoo。我输入:

na.approx(toydat$SWC, na.rm = FALSE)

但这只是像以前一样返回数据并且不进行插值(我输入了 $SWC,因为我只想插值该列)。我想如果我添加rule = 2到代码中,它会在 NaN 值之后获取最后一个值,并将该值用于所有我不想要的 NaN 值。我也尝试过使用maxgap = 48,因为我认为这样可以确保只插入 48 个值。但是,由于无论如何我都无法正确插值,所以什么也没发生。

我真的很感激一些帮助。

标签: rinterpolationno-data

解决方案


这是一个使用 Base R 的凌乱的方法,它将在 data.frame 的开头处理 na 值并在 data.frame 的末尾推断出 na 值,它确实假设频率是相同的。注意:df_list 是您列表的代理

# Linear interpolation function handling ties,
# returns interpolated vector the same length 
# a the input vector: -> vector
l_interp_vec <- function(na_vec){
  approx(x = na_vec, method = "linear", ties = "constant", n = length(na_vec))$y
}

# Applied to a dataframe, replacing NA values
# in each of the numeric vectors,
# with interpolated values.
# input is dataframe: -> dataframe()
interped_df <- function(df){
  data.frame(lapply(df, function(x) {
  if (is.numeric(x)) {
    # Store a scalar of min row where x isn't NA: -> min_non_na
    min_non_na <- min(which(!(is.na(x))))
    # Store a scalar of max row where x isn't NA: -> max_non_na
    max_non_na <- max(which(!(is.na(x))))
    # Store scalar of the number of rows needed to impute prior
    # to first NA value: -> ru_lower
    ru_lower <- ifelse(min_non_na > 1, min_non_na - 1, min_non_na)
    # Store scalar of the number of rows needed to impute after
    # the last non-NA value: -> ru_upper
    ru_upper <- ifelse(max_non_na == length(x),
                       length(x) - 1,
                       (length(x) - (max_non_na + 1)))

    # Store a vector of the ramp to function: -> l_ramp_up:
    ramp_up <-
      as.numeric(cumsum(rep(x[min_non_na] / (min_non_na), ru_lower)))

    # Apply the interpolation function on vector "x": -> y
    y <-
      as.numeric(l_interp_vec(as.numeric(x[min_non_na:max_non_na])))

    # Create a vector that combines the ramp_up vector
    # and y if the first NA is at row 1: -> z
    if (length(ramp_up) > 1 & max_non_na != length(x)) {
      # Create a vector interpolations if there are
      # multiple NA values after the last value: -> lower_l_int
      lower_l_int <-
        as.numeric(cumsum(rep(mean(diff(
          c(ramp_up, y)
        )),
        ru_upper + 1)) +
          as.numeric(x[max_non_na]))

      # Store the linear interpolations in  a vector: -> z
      z <- as.numeric(c(ramp_up, y, lower_l_int))

    } else if (length(ramp_up) > 1 & max_non_na == length(x)) {
      # Store the linear interpolations in  a vector: -> z
      z <- as.numeric(c(ramp_up, y))

    } else if (min_non_na == 1 & max_non_na != length(x)) {
      # Create a vector interpolations if there are
      # multiple NA values after the last value: -> lower_l_int
      lower_l_int <-
        as.numeric(cumsum(rep(mean(diff(
          c(ramp_up, y)
        )),
        ru_upper + 1)) +
          as.numeric(x[max_non_na]))

      # Store the linear interpolations in  a vector: -> z
      z <- as.numeric(c(y, lower_l_int))

    } else{
      # Store the linear interpolations in  a vector: -> z
      z <- as.numeric(y)

    }

    # Interpolate between points in x, return new x:
    return(as.numeric(ifelse(is.na(x), z, x)))

  } else{
    x

  }

}))}

df_list_interped_extrapped <- lapply(df_list, interped_df)

推荐阅读