r - 线性插值以填充不大于 2 天的间隙的 NoData
问题描述
我正在使用列表(2014-2018 年每年的数据帧)列表(测量深度)的嵌套列表(地面传感器),我想对每个数据帧执行线性插值。这是数据集的概述,以便您了解它的外观:
str(G1OUT_gwFERN)
$ SE13 :List of 3
..$ d20:List of 5
.. ..$ 2014:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
.. .. ..$ SWC : num [1:8760] 46 45.9 46 45.9 45.9 ...
.. ..$ 2015:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
.. .. ..$ SWC : num [1:8760] 49.8 49.8 49.8 49.8 49.8 ...
.. ..$ 2016:'data.frame': 8784 obs. of 2 variables:
.. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
.. .. ..$ SWC : num [1:8784] 48.2 48.2 48.1 48.1 48.1 ...
.. ..$ 2017:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
.. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
.. ..$ 2018:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
.. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
..$ d50:List of 5
.. ..$ 2014:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
.. .. ..$ SWC : num [1:8760] 35.2 35.2 35.2 35.2 35.2 ...
.. ..$ 2015:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
.. .. ..$ SWC : num [1:8760] 34.8 34.8 34.7 34.7 34.8 ...
.. ..$ 2016:'data.frame': 8784 obs. of 2 variables:
.. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
.. .. ..$ SWC : num [1:8784] 34.2 34.2 34.1 34.1 34.1 ...
.. ..$ 2017:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
.. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
.. ..$ 2018:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
.. .. ..$ SWC : num [1:8760] 36.4 36.4 36.3 36.3 36.3 ...
..$ d5 :List of 5
.. ..$ 2014:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
.. .. ..$ SWC : num [1:8760] 32.5 32.4 32.4 32.4 32.4 ...
.. ..$ 2015:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
.. .. ..$ SWC : num [1:8760] 32.1 32.1 32.1 32.1 32.1 ...
.. ..$ 2016:'data.frame': 8784 obs. of 2 variables:
.. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
.. .. ..$ SWC : num [1:8784] 30.3 30.3 30.3 30.2 30.2 ...
.. ..$ 2017:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
.. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
.. ..$ 2018:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
.. .. ..$ SWC : num [1:8760] 31.1 31.2 31.1 31.1 31.1 ...
$ SE14 :List of 3
..$ d20:List of 5
.. ..$ 2014:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
.. .. ..$ SWC : num [1:8760] 52.5 52.5 52.5 52.5 52.4 ...
.. ..$ 2015:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
.. .. ..$ SWC : num [1:8760] 53.7 53.7 53.7 53.7 53.7 ...
.. ..$ 2016:'data.frame': 8784 obs. of 2 variables:
.. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
.. .. ..$ SWC : num [1:8784] 52.3 52.2 52.3 52.3 52.2 ...
.. ..$ 2017:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
.. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
.. ..$ 2018:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
.. .. ..$ SWC : num [1:8760] 55 55 55 55.1 55 ...
..$ d50:List of 5
.. ..$ 2014:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
.. .. ..$ SWC : num [1:8760] 27.9 27.9 27.9 27.9 27.9 ...
.. ..$ 2015:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
.. .. ..$ SWC : num [1:8760] 28.5 28.5 28.5 28.5 28.5 ...
.. ..$ 2016:'data.frame': 8784 obs. of 2 variables:
.. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
.. .. ..$ SWC : num [1:8784] 26.7 26.7 26.7 26.6 26.7 ...
.. ..$ 2017:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
.. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
.. ..$ 2018:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
.. .. ..$ SWC : num [1:8760] 29.4 29.4 29.4 29.4 29.5 ...
..$ d5 :List of 5
.. ..$ 2014:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
.. .. ..$ SWC : num [1:8760] 39.8 39.8 39.7 39.6 39.7 ...
.. ..$ 2015:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
.. .. ..$ SWC : num [1:8760] 42.2 42.3 42.3 42.3 42.3 ...
.. ..$ 2016:'data.frame': 8784 obs. of 2 variables:
.. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
.. .. ..$ SWC : num [1:8784] 36.6 36.6 36.5 36.6 36.5 ...
.. ..$ 2017:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
.. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
.. ..$ 2018:'data.frame': 8760 obs. of 2 variables:
.. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
.. .. ..$ SWC : num [1:8760] 56.5 56.5 56.5 56.5 56.3 ...
我从列表中提取了其中一个数据框的一部分并使用了dput()
,因此您可以使用一些玩具数据:
toydat <- structure(list(Date = structure(c(16277, 16277, 16277, 16277,
16277, 16277, 16277, 16277, 16277, 16277, 16277, 16277, 16277,
16277, 16277, 16277, 16277, 16277, 16277, 16278, 16278, 16278,
16278, 16278, 16278, 16278, 16278, 16278, 16278, 16278, 16278,
16278, 16278, 16278, 16278, 16278, 16278, 16278, 16278, 16278,
16278, 16278, 16278, 16279, 16279, 16279, 16279, 16279, 16279,
16279, 16279, 16279, 16279, 16279, 16279, 16279, 16279, 16279,
16279, 16279, 16279, 16279, 16279, 16279, 16279, 16279, 16279,
16280, 16280, 16280, 16280, 16280, 16280, 16280, 16280, 16280,
16280, 16280, 16280, 16280, 16280, 16280, 16280, 16280, 16280,
16280, 16280, 16280, 16280, 16280, 16280, 16281, 16281, 16281,
16281, 16281, 16281, 16281, 16281, 16281, 16281), class = "Date"),
SWC = c(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 19.627243, 19.543659,
19.593796, 19.534379, 19.59937, 19.51582, 19.482441, 19.51582,
19.571497, 19.645825, 20.83435, 21.116572, 22.688702, 22.216629,
21.54243, 21.229946, 21.003335, 20.833735, 20.74902, 20.608045,
20.512311, 20.411049)), row.names = 48774:48874, class = "data.frame")
测量是每小时进行一次,所以我一天有 24 次测量。数据帧中的一些值是 NoData 值,所以我想使用线性插值来填补这些空白。但是,如果 NoData 值和实际值之间的差距不大于 2 天,我只想使用线性插值。关于玩具数据,这意味着如果缺少 7 月 28 日和 29 日(2014-07-28 和 2014-07-29)的值,我只想填补这些天的空白,而不是 27 日, 26日、25日……7月等等。如果差距大于 2 天,我想保留 NoData 值,因为稍后我将使用线性回归来填补这些差距,但这不应该是本文的主题。
我已经尝试过以下事情:
我使用了包中的na.approx()
功能zoo
。我输入:
na.approx(toydat$SWC, na.rm = FALSE)
但这只是像以前一样返回数据并且不进行插值(我输入了 $SWC,因为我只想插值该列)。我想如果我添加rule = 2
到代码中,它会在 NaN 值之后获取最后一个值,并将该值用于所有我不想要的 NaN 值。我也尝试过使用maxgap = 48
,因为我认为这样可以确保只插入 48 个值。但是,由于无论如何我都无法正确插值,所以什么也没发生。
我真的很感激一些帮助。
解决方案
这是一个使用 Base R 的凌乱的方法,它将在 data.frame 的开头处理 na 值并在 data.frame 的末尾推断出 na 值,它确实假设频率是相同的。注意:df_list 是您列表的代理
# Linear interpolation function handling ties,
# returns interpolated vector the same length
# a the input vector: -> vector
l_interp_vec <- function(na_vec){
approx(x = na_vec, method = "linear", ties = "constant", n = length(na_vec))$y
}
# Applied to a dataframe, replacing NA values
# in each of the numeric vectors,
# with interpolated values.
# input is dataframe: -> dataframe()
interped_df <- function(df){
data.frame(lapply(df, function(x) {
if (is.numeric(x)) {
# Store a scalar of min row where x isn't NA: -> min_non_na
min_non_na <- min(which(!(is.na(x))))
# Store a scalar of max row where x isn't NA: -> max_non_na
max_non_na <- max(which(!(is.na(x))))
# Store scalar of the number of rows needed to impute prior
# to first NA value: -> ru_lower
ru_lower <- ifelse(min_non_na > 1, min_non_na - 1, min_non_na)
# Store scalar of the number of rows needed to impute after
# the last non-NA value: -> ru_upper
ru_upper <- ifelse(max_non_na == length(x),
length(x) - 1,
(length(x) - (max_non_na + 1)))
# Store a vector of the ramp to function: -> l_ramp_up:
ramp_up <-
as.numeric(cumsum(rep(x[min_non_na] / (min_non_na), ru_lower)))
# Apply the interpolation function on vector "x": -> y
y <-
as.numeric(l_interp_vec(as.numeric(x[min_non_na:max_non_na])))
# Create a vector that combines the ramp_up vector
# and y if the first NA is at row 1: -> z
if (length(ramp_up) > 1 & max_non_na != length(x)) {
# Create a vector interpolations if there are
# multiple NA values after the last value: -> lower_l_int
lower_l_int <-
as.numeric(cumsum(rep(mean(diff(
c(ramp_up, y)
)),
ru_upper + 1)) +
as.numeric(x[max_non_na]))
# Store the linear interpolations in a vector: -> z
z <- as.numeric(c(ramp_up, y, lower_l_int))
} else if (length(ramp_up) > 1 & max_non_na == length(x)) {
# Store the linear interpolations in a vector: -> z
z <- as.numeric(c(ramp_up, y))
} else if (min_non_na == 1 & max_non_na != length(x)) {
# Create a vector interpolations if there are
# multiple NA values after the last value: -> lower_l_int
lower_l_int <-
as.numeric(cumsum(rep(mean(diff(
c(ramp_up, y)
)),
ru_upper + 1)) +
as.numeric(x[max_non_na]))
# Store the linear interpolations in a vector: -> z
z <- as.numeric(c(y, lower_l_int))
} else{
# Store the linear interpolations in a vector: -> z
z <- as.numeric(y)
}
# Interpolate between points in x, return new x:
return(as.numeric(ifelse(is.na(x), z, x)))
} else{
x
}
}))}
df_list_interped_extrapped <- lapply(df_list, interped_df)
推荐阅读
- automata - 为语言构建 NPDA
- postgresql - 两个表的一个外键会引起混乱吗?
- c# - IOS 上不一致的 Bouncy Castle ECDSA 签名/验证行为
- javascript - 延迟加载和 ScrollIntoView() Angular2(版本 7)
- raspberry-pi - 使用 Raspbian Buster 在新的 RaspberryPi 4 上构建 Redis 失败
- azure - Azure DataBricks 中的 Azure Devops 集成错误
- c# - 如何将东西放在我的 RequestBody 中以供我的 POST 方法查找?
- powerbi - 我想要一个避免被酒吧聊天轴过滤的措施
- c# - 将 if 语句条件添加到 Linq 语句
- javascript - 从 AJAX 调用返回的 blob 下载 PDF