r - 如何在不同长度的时间序列中创建随机间隙?
问题描述
对于我的硕士论文,我必须在现有数据集上检查不同的填空方法。因此我必须添加不同长度的人工间隙(1h,5h ..),这样我就可以用不同的方法填充它们。是否有一个简单的功能可以做到这一点?
这是数据框的示例:
structure(list(DateTime = structure(c(1420074000, 1420077600,
1420081200, 1420084800, 1420088400, 1420092000, 1420095600, 1420099200,
1420102800, 1420106400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
`Dd 1-1` = c(0.0186269166666667, 0.0242605625, 0.00373020138888889,
0.000966965277777778, 0.0119253611111111, 0.0495888958333333,
0.02014125, 0.0306862638888889, 0.0324395694444444, 0.0191942152777778
), `Dd 1-3` = c(0.0242500833333333, 0.0349086388888889, 0,
0.00135595138888889, 0.0221090138888889, 0.0600941527777778,
0.0462282986111111, 0.0171887638888889, 0.0481975347222222,
0.0226582152777778), `Dd 1-5` = c(0.0212732152777778, 0.0284445347222222,
0.00276098611111111, 0.0142581875, 0.0276248958333333, 0.0328644027777778,
0.0495009166666667, 0.0173377777777778, 0.0384788194444444,
0.017663875), luecken = c(0.0186269166666667, 0.0242605625,
0.00373020138888889, 0.000966965277777778, 0.0119253611111111,
0.0495888958333333, 0.02014125, 0.0306862638888889, 0.0324395694444444,
0.0191942152777778)), row.names = c(NA, 10L), class = c("tbl_df",
"tbl", "data.frame"))
解决方案
如果我正确理解了您的问题,一种可能的解决方案是:
set.seed(4) # make it reproducable
del <- sort(sample(1:nrow(df), 4, replace=FALSE)) # get 4 random indexex from the total number of rows and sort them
del2 <- del[diff(del) !=1] # delete those values that have a difference of 1 (meaning "connected")
df[del2, c(2:5)] <- NA # set column 2 to 5 NA for the indices we calculated above
DateTime `Dd 1-1` `Dd 1-3` `Dd 1-5` luecken
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2015-01-01 01:00:00 0.0186 0.0243 0.0213 0.0186
2 2015-01-01 02:00:00 0.0243 0.0349 0.0284 0.0243
3 2015-01-01 03:00:00 NA NA NA NA
4 2015-01-01 04:00:00 0.000967 0.00136 0.0143 0.000967
5 2015-01-01 05:00:00 0.0119 0.0221 0.0276 0.0119
6 2015-01-01 06:00:00 0.0496 0.0601 0.0329 0.0496
7 2015-01-01 07:00:00 0.0201 0.0462 0.0495 0.0201
8 2015-01-01 08:00:00 0.0307 0.0172 0.0173 0.0307
9 2015-01-01 09:00:00 NA NA NA NA
10 2015-01-01 10:00:00 0.0192 0.0227 0.0177 0.0192
只是要明确一点:清理连接间隙的步骤并不完全正确,因为在随机数为 1 - 4 的情况下,这将下降 2、3 和 4,但在大数据上,如果你不计划,它应该是一个足够的解决方案与整个数据集相比,删除许多值
现在介绍如何创建更大的间隙(我将使用 3h,因为您的示例数据只有 10 行)
set.seed(4)
del <- sort(sample(1:nrow(df), 3, replace=FALSE))
del2 <- del[diff(del) > 3] #set difference to more than maximum size of gap wanted
del3 <- c(del2, del2 + 1, del2 + 2) # build vector with +1 and +2 to get indices conecting conecting to the onces you have
del4 <- del3[del3 <= nrow(df)] # make sure it is not out of bound (max index should be 10 even if gap starts at line 10
df[del4, c(2:5)] <- NA
DateTime `Dd 1-1` `Dd 1-3` `Dd 1-5` luecken
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2015-01-01 01:00:00 0.0186 0.0243 0.0213 0.0186
2 2015-01-01 02:00:00 0.0243 0.0349 0.0284 0.0243
3 2015-01-01 03:00:00 NA NA NA NA
4 2015-01-01 04:00:00 NA NA NA NA
5 2015-01-01 05:00:00 NA NA NA NA
6 2015-01-01 06:00:00 0.0496 0.0601 0.0329 0.0496
7 2015-01-01 07:00:00 0.0201 0.0462 0.0495 0.0201
8 2015-01-01 08:00:00 0.0307 0.0172 0.0173 0.0307
9 2015-01-01 09:00:00 NA NA NA NA
10 2015-01-01 10:00:00 NA NA NA NA
推荐阅读
- javascript - 在 node.js 中跨不同路由使用 socket.io
- javascript - 如何将 $scope 对象值添加到另一个 $scope 对象?(AngularJS)
- python - 如何为类中的变量分配类型/值
- ios - 想办法设置 HTTPS
- ios - iOS 是否支持 PWA 的“添加到主屏幕”功能?
- php - 在 laravel 5.7 中使用 ajax 技术
- kubernetes - 如何在 kubernetes 上从 pgadmin 访问 pgsql
- php - 为旅游公司实施 Google 评论
- lambda - 错误:返回错误:无效发件人,以太坊专用网络错误
- python - 为什么<
> 双击 Entry 小部件时,Python 的 Tkinter 触发器中的绑定错误?