pandas - 偏移读数偏移的时间序列
问题描述
我想从每小时开始每 15 分钟获取一次读数,给定一组每小时读数但与每小时偏移几分钟的读数。
我的第一种方法是使用 resample 到 15 分钟,但我没有得到预期的结果:
因此,如果读数是按小时计算的,则重新采样可以正常工作:
left_key = pd.to_datetime(['2020-12-01 00:00',
'2020-12-01 01:00',
'2020-12-01 02:00',
'2020-12-01 03:00',
'2020-12-01 04:00',
'2020-12-01 05:00'])
left_data = pd.Series([12,12,13,15,16,15], index=left_key, name='master')
resampled = left_data.resample('15min')
resampled.interpolate(method='spline', order=2)
产生我需要的东西:
2020-12-01 00:00:00 12.000000
2020-12-01 00:15:00 11.777455
2020-12-01 00:30:00 12.079464
2020-12-01 00:45:00 12.370313
2020-12-01 01:00:00 12.000000
2020-12-01 01:15:00 12.918527
2020-12-01 01:30:00 13.175893
但如果读数偏离小时:
left_key = pd.to_datetime(['2020-12-01 00:06',
'2020-12-01 01:06',
'2020-12-01 02:06',
'2020-12-01 03:06',
'2020-12-01 04:06',
'2020-12-01 05:06'])
left_data = pd.Series([12,12,13,15,16,15], index=left_key, name='master')
resampled = left_data.resample('15min')
resampled.interpolate(method='spline', order=2)
现在我没有数据
2020-12-01 00:00:00 NaN
2020-12-01 00:15:00 NaN
2020-12-01 00:30:00 NaN
2020-12-01 00:45:00 NaN
2020-12-01 01:00:00 NaN
如果我每小时重新采样一次,它只会将读数向后移动
resampled = left_data.resample('H')
resampled.interpolate(method='spline', order=2)
2020-12-01 00:00:00 12
2020-12-01 01:00:00 12
2020-12-01 02:00:00 13
2020-12-01 03:00:00 15
2020-12-01 04:00:00 16
2020-12-01 05:00:00 15
有没有办法让重新采样来插入读数,这样我就可以得到正确的小时值?(这个问题有更好的标题吗!)
更新
虽然这些解决方案有效,但它不适用于大量数据。1000 行对我的机器来说太多了!即使减少初始重采样大小也需要大量内存和时间来完成。
这是这个问题的另一种解决方案:将一个时间序列插入到自定义时间序列上
# create a new index for the ranges of datetimes required
starts = df.index.min()
starts = datetime(starts.year, starts.month, starts.day, starts.hour,15*(starts.minute // 15))
master = pd.date_range(starts, df.index.max(), freq="15min")
# will need this to identify original data rows later
df['tag'] = True
# merge with original data and interpolate missing rows
idx = df.index.union(master)
df2 = df.reindex(idx).interpolate('index')
# now remove the things we don't want
df2.drop(df2.index[0], inplace=True) # first value will be NaN (unless has real data)
# use the tag column to remove the original data and then drop that column
df2 = df2[df2['tag'].isna()]
df2.drop(columns=['tag',], inplace=True)
这要快得多!
解决方案
好的。这不是所有解决方案中最漂亮的,但它在过去对我有用。这是一个技巧,包括在应用您想要的之前以可忽略的时间间隔重新采样两次。首先,您需要按时设置索引(日期)。
left_key = pd.to_datetime(['2020-12-01 00:06',
'2020-12-01 01:06',
'2020-12-01 02:06',
'2020-12-01 03:06',
'2020-12-01 04:06',
'2020-12-01 05:06'])
left_data = pd.Series([12,12,13,15,16,15])
df = pd.DataFrame({'Dates':left_key , 'Values':left_data})
df.set_index('Dates', inplace=True)
df1 = df.resample('1ms').interpolate(method='spline', order=2).resample('15min').first()
这使
Values
Dates
2020-12-01 00:00:00 12.000000
2020-12-01 00:15:00 11.653527
2020-12-01 00:30:00 11.960000
2020-12-01 00:45:00 12.255313
2020-12-01 01:00:00 12.539464
2020-12-01 01:15:00 12.812455
2020-12-01 01:30:00 13.074286
2020-12-01 01:45:00 13.324955
2020-12-01 02:00:00 13.564464
2020-12-01 02:15:00 13.792813
2020-12-01 02:30:00 14.010000
2020-12-01 02:45:00 14.216027
2020-12-01 03:00:00 14.410893
2020-12-01 03:15:00 14.594598
2020-12-01 03:30:00 14.767143
2020-12-01 03:45:00 14.928527
2020-12-01 04:00:00 15.078750
2020-12-01 04:15:00 15.217812
2020-12-01 04:30:00 15.345714
2020-12-01 04:45:00 15.462455
2020-12-01 05:00:00 15.568036
然后,你连接你原来的 df
frames = [df, df1]
df2 = pd.concat(frames)
df2.sort_values('Dates')
返回
Values
Dates
2020-12-01 00:00:00 12.000000
2020-12-01 00:06:00 12.000000
2020-12-01 00:15:00 11.653527
2020-12-01 00:30:00 11.960000
2020-12-01 00:45:00 12.255313
2020-12-01 01:00:00 12.539464
2020-12-01 01:06:00 12.000000
2020-12-01 01:15:00 12.812455
2020-12-01 01:30:00 13.074286
2020-12-01 01:45:00 13.324955
2020-12-01 02:00:00 13.564464
2020-12-01 02:06:00 13.000000
2020-12-01 02:15:00 13.792813
2020-12-01 02:30:00 14.010000
2020-12-01 02:45:00 14.216027
2020-12-01 03:00:00 14.410893
2020-12-01 03:06:00 15.000000
2020-12-01 03:15:00 14.594598
2020-12-01 03:30:00 14.767143
2020-12-01 03:45:00 14.928527
2020-12-01 04:00:00 15.078750
2020-12-01 04:06:00 16.000000
2020-12-01 04:15:00 15.217812
2020-12-01 04:30:00 15.345714
2020-12-01 04:45:00 15.462455
2020-12-01 05:00:00 15.568036
2020-12-01 05:06:00 15.000000
推荐阅读
- spring - Spring Boot 未从 YML 文件加载属性
- angular - 打字稿连接字符串来绘制Html
- c# - GetGeotagAsync 对通过 Folder.CreateFileQueryWithOptions w/ OrderByName 创建的列表中的文件失败
- datetime - 如何在所选柱之后打印柱 x 柱的日期?
- google-bigquery - BigQuery 中分析函数的可扩展性问题是否有解决方案?
- python - 使用 Python 'requests' 库的 HTTPS POST 请求出现不一致的 IOError 异常
- sql-server - 从 UWP 的安装包创建本地 SQL Server 数据库
- ios - 如何将 iOS 应用连接到 Google Cloud SQL?
- python - 从子类访问命名元组
- sql-server - 无法从 SQL Server 代理运行 SQL Server SSIS 包