首页 > 解决方案 > pd.date_range 如何排除几个小时

问题描述

嗨,我有一个关于使用 pd.date_range() 的问题。我正在做 ARIMA 模型,我需要在一个步骤中预测一些价格值。例如,在时间 2021-01-04 11:20。我想用 freq = '5Min' 生成下一个 4 日期索引,所以我编写了以下代码

pd.date_range(start = '2021-01-04 11:20', periods = 5, freq = '5Min')

这给了

['2021-01-04 11:20', '2021-01-04 11:25', '2021-01-04 11:30', '2021-01-04 11:35', '2021-01-04 11:40']

但市场在下午开市。所以 11:30 之后,市场将在 '2021-01-04 15:00' 开市,所以该系列应该是。

['2021-01-04 11:20', '2021-01-04 11:25', '2021-01-04 15:00', '2021-01-04 15:05', '2021-01-04 15:10'].

那么如何自定义频率以便我可以在一天中排除一些“小时范围”?

谢谢!对此,我真的非常感激!

标签: pythonpandasdate-range

解决方案


用于DatetimeIndex.indexer_between_time位置,然后通过np.isinin过滤掉这些值boolean indexing

r = pd.date_range(start = '2021-01-04 00:00', periods = 100, freq = '30Min')

ind = (r.indexer_between_time('11:30','13:30').tolist() +
       r.indexer_between_time('15:00','21:00').tolist() +
       r.indexer_between_time('23:00','09:00').tolist())
# print (ind)

out = r[np.isin(np.arange(len(r)), ind, invert=True)]
print (out)
DatetimeIndex(['2021-01-04 09:30:00', '2021-01-04 10:00:00',
               '2021-01-04 10:30:00', '2021-01-04 11:00:00',
               '2021-01-04 14:00:00', '2021-01-04 14:30:00',
               '2021-01-04 21:30:00', '2021-01-04 22:00:00',
               '2021-01-04 22:30:00', '2021-01-05 09:30:00',
               '2021-01-05 10:00:00', '2021-01-05 10:30:00',
               '2021-01-05 11:00:00', '2021-01-05 14:00:00',
               '2021-01-05 14:30:00', '2021-01-05 21:30:00',
               '2021-01-05 22:00:00', '2021-01-05 22:30:00'],
              dtype='datetime64[ns]', freq=None)

另一个想法是使用掩码:

from datetime import time

r = pd.date_range(start = '2021-01-04 00:00', periods = 100, freq = '30Min')

m = ((r.time > time(hour=9, minute=0)) & (r.time < time(hour=11, minute=30)) |
     (r.time > time(hour=13, minute=30)) & (r.time < time(hour=15, minute=0)) |
     (r.time > time(hour=21, minute=0)) & (r.time < time(hour=23, minute=0)))
       
print (m)

out = r[m]
print (out)
DatetimeIndex(['2021-01-04 09:30:00', '2021-01-04 10:00:00',
               '2021-01-04 10:30:00', '2021-01-04 11:00:00',
               '2021-01-04 14:00:00', '2021-01-04 14:30:00',
               '2021-01-04 21:30:00', '2021-01-04 22:00:00',
               '2021-01-04 22:30:00', '2021-01-05 09:30:00',
               '2021-01-05 10:00:00', '2021-01-05 10:30:00',
               '2021-01-05 11:00:00', '2021-01-05 14:00:00',
               '2021-01-05 14:30:00', '2021-01-05 21:30:00',
               '2021-01-05 22:00:00', '2021-01-05 22:30:00'],
              dtype='datetime64[ns]', freq=None)

下一个替代方法numpy.r_用于连接索引并按它们过滤:

ind1 = (np.r_[r.indexer_between_time('9:00','11:30', include_start=False, include_end=False),
              r.indexer_between_time('13:30','15:00', include_start=False, include_end=False),
              r.indexer_between_time('21:00','23:00', include_start=False, include_end=False)])

out = r[ind1]
print (out)
DatetimeIndex(['2021-01-04 09:30:00', '2021-01-04 10:00:00',
               '2021-01-04 10:30:00', '2021-01-04 11:00:00',
               '2021-01-05 09:30:00', '2021-01-05 10:00:00',
               '2021-01-05 10:30:00', '2021-01-05 11:00:00',
               '2021-01-04 14:00:00', '2021-01-04 14:30:00',
               '2021-01-05 14:00:00', '2021-01-05 14:30:00',
               '2021-01-04 21:30:00', '2021-01-04 22:00:00',
               '2021-01-04 22:30:00', '2021-01-05 21:30:00',
               '2021-01-05 22:00:00', '2021-01-05 22:30:00'],
              dtype='datetime64[ns]', freq=None)

推荐阅读