首页 > 解决方案 > 偏移读数偏移的时间序列

问题描述

我想从每小时开始每 15 分钟获取一次读数,给定一组每小时读数但与每小时偏移几分钟的读数。

我的第一种方法是使用 resample 到 15 分钟,但我没有得到预期的结果:

因此,如果读数是按小时计算的,则重新采样可以正常工作:

left_key =  pd.to_datetime(['2020-12-01 00:00',
               '2020-12-01 01:00',
               '2020-12-01 02:00',
               '2020-12-01 03:00',
               '2020-12-01 04:00',
               '2020-12-01 05:00'])
left_data = pd.Series([12,12,13,15,16,15], index=left_key, name='master')

resampled = left_data.resample('15min')
resampled.interpolate(method='spline', order=2)

产生我需要的东西:

2020-12-01 00:00:00    12.000000
2020-12-01 00:15:00    11.777455
2020-12-01 00:30:00    12.079464
2020-12-01 00:45:00    12.370313
2020-12-01 01:00:00    12.000000
2020-12-01 01:15:00    12.918527
2020-12-01 01:30:00    13.175893

但如果读数偏离小时:

left_key =  pd.to_datetime(['2020-12-01 00:06',
               '2020-12-01 01:06',
               '2020-12-01 02:06',
               '2020-12-01 03:06',
               '2020-12-01 04:06',
               '2020-12-01 05:06'])
left_data = pd.Series([12,12,13,15,16,15], index=left_key, name='master')

resampled = left_data.resample('15min')
resampled.interpolate(method='spline', order=2)

现在我没有数据

2020-12-01 00:00:00   NaN
2020-12-01 00:15:00   NaN
2020-12-01 00:30:00   NaN
2020-12-01 00:45:00   NaN
2020-12-01 01:00:00   NaN

如果我每小时重新采样一次,它只会将读数向后移动

resampled = left_data.resample('H')
resampled.interpolate(method='spline', order=2)


2020-12-01 00:00:00    12
2020-12-01 01:00:00    12
2020-12-01 02:00:00    13
2020-12-01 03:00:00    15
2020-12-01 04:00:00    16
2020-12-01 05:00:00    15

有没有办法让重新采样来插入读数,这样我就可以得到正确的小时值?(这个问题有更好的标题吗!)

更新

虽然这些解决方案有效,但它不适用于大量数据。1000 行对我的机器来说太多了!即使减少初始重采样大小也需要大量内存和时间来完成。

这是这个问题的另一种解决方案:将一个时间序列插入到自定义时间序列上

# create a new index for the ranges of datetimes required
starts = df.index.min()
starts = datetime(starts.year, starts.month, starts.day, starts.hour,15*(starts.minute // 15))
master = pd.date_range(starts, df.index.max(), freq="15min")

# will need this to identify original data rows later
df['tag'] = True

# merge with original data and interpolate missing rows
idx = df.index.union(master)
df2 = df.reindex(idx).interpolate('index')

# now remove the things we don't want
df2.drop(df2.index[0], inplace=True)  # first value will be NaN (unless has real data)
# use the tag column to remove the original data and then drop that column
df2 = df2[df2['tag'].isna()]
df2.drop(columns=['tag',], inplace=True)

这要快得多!

标签: pandastime-series

解决方案


好的。这不是所有解决方案中最漂亮的,但它在过去对我有用。这是一个技巧,包括在应用您想要的之前以可忽略的时间间隔重新采样两次。首先,您需要按时设置索引(日期)。

left_key =  pd.to_datetime(['2020-12-01 00:06',
               '2020-12-01 01:06',
               '2020-12-01 02:06',
               '2020-12-01 03:06',
               '2020-12-01 04:06',
               '2020-12-01 05:06'])
left_data = pd.Series([12,12,13,15,16,15])

df = pd.DataFrame({'Dates':left_key , 'Values':left_data})
df.set_index('Dates', inplace=True)    
df1 = df.resample('1ms').interpolate(method='spline', order=2).resample('15min').first()

这使

                        Values
Dates                         
2020-12-01 00:00:00  12.000000
2020-12-01 00:15:00  11.653527
2020-12-01 00:30:00  11.960000
2020-12-01 00:45:00  12.255313
2020-12-01 01:00:00  12.539464
2020-12-01 01:15:00  12.812455
2020-12-01 01:30:00  13.074286
2020-12-01 01:45:00  13.324955
2020-12-01 02:00:00  13.564464
2020-12-01 02:15:00  13.792813
2020-12-01 02:30:00  14.010000
2020-12-01 02:45:00  14.216027
2020-12-01 03:00:00  14.410893
2020-12-01 03:15:00  14.594598
2020-12-01 03:30:00  14.767143
2020-12-01 03:45:00  14.928527
2020-12-01 04:00:00  15.078750
2020-12-01 04:15:00  15.217812
2020-12-01 04:30:00  15.345714
2020-12-01 04:45:00  15.462455
2020-12-01 05:00:00  15.568036

然后,你连接你原来的 df

frames = [df, df1]
df2 = pd.concat(frames)
df2.sort_values('Dates')

返回

                        Values
Dates                         
2020-12-01 00:00:00  12.000000
2020-12-01 00:06:00  12.000000
2020-12-01 00:15:00  11.653527
2020-12-01 00:30:00  11.960000
2020-12-01 00:45:00  12.255313
2020-12-01 01:00:00  12.539464
2020-12-01 01:06:00  12.000000
2020-12-01 01:15:00  12.812455
2020-12-01 01:30:00  13.074286
2020-12-01 01:45:00  13.324955
2020-12-01 02:00:00  13.564464
2020-12-01 02:06:00  13.000000
2020-12-01 02:15:00  13.792813
2020-12-01 02:30:00  14.010000
2020-12-01 02:45:00  14.216027
2020-12-01 03:00:00  14.410893
2020-12-01 03:06:00  15.000000
2020-12-01 03:15:00  14.594598
2020-12-01 03:30:00  14.767143
2020-12-01 03:45:00  14.928527
2020-12-01 04:00:00  15.078750
2020-12-01 04:06:00  16.000000
2020-12-01 04:15:00  15.217812
2020-12-01 04:30:00  15.345714
2020-12-01 04:45:00  15.462455
2020-12-01 05:00:00  15.568036
2020-12-01 05:06:00  15.000000

推荐阅读