首页 > 解决方案 > Pandas,如何检查哪些 date_range 值在 pd.Interval 列的时间范围内

问题描述

我有一个表示(多个)小时间隔的数据框不是免费的:

import pandas as pd
df = pd.DataFrame(
    {
        'reserved': [
                        pd.Interval(pd.Timestamp(2011,11,9,8), pd.Timestamp(2011,11,9,12), closed='left'),
                        pd.Interval(pd.Timestamp(2011,11,9,13), pd.Timestamp(2011,11,9,21), closed='left')
                    ],
        'value': [1, 2]
})

|    | reserved                                   |   value |
|---:|:-------------------------------------------|--------:|
|  0 | [2011-11-09 08:00:00, 2011-11-09 12:00:00) |       1 |
|  1 | [2011-11-09 13:00:00, 2011-11-09 21:00:00) |       2 |

我必须找出从 07:00 到 23:00 没有保留的时间。像这样的东西:

working_hours = pd.date_range('2021-11-09 07', '2021-11-09 23', freq='1H')

DatetimeIndex(['2021-11-09 07:00:00', '2021-11-09 08:00:00',
               '2021-11-09 09:00:00', '2021-11-09 10:00:00',
               '2021-11-09 11:00:00', '2021-11-09 12:00:00',
               '2021-11-09 13:00:00', '2021-11-09 14:00:00',
               '2021-11-09 15:00:00', '2021-11-09 16:00:00',
               '2021-11-09 17:00:00', '2021-11-09 18:00:00',
               '2021-11-09 19:00:00', '2021-11-09 20:00:00',
               '2021-11-09 21:00:00', '2021-11-09 22:00:00',
               '2021-11-09 23:00:00'],
              dtype='datetime64[ns]', freq='H')

哪个 working_hours 不是 df.reserved 间隔的一部分?在上面的示例中,我们可以看到2021-11-09 07:00:00not reserved、2021-11-09 12:00:00is not reserved 以及 hours 2011-11-09 21:00:00 - 2011-11-09 22:00:00

我希望我能够做这样的事情:

pd.Timestamp(2021,11,9,8) in df.reserved

但这总是回报False

我需要的是:working_hours not in df.reserved为了得到:

2021-11-09 07:00:00
2021-11-09 12:00:00
2011-11-09 21:00:00
2011-11-09 22:00:00

怎么做?

标签: pythonpandastime-seriesintervalsbetween

解决方案


用于Index.get_indexer位置,如果没有返回匹配-1,那么可能的过滤器working_hours

i = pd.IntervalIndex(df.reserved)
s = pd.Series(working_hours[i.get_indexer(working_hours) == -1])
print (s)
0   2011-11-09 07:00:00
1   2011-11-09 12:00:00
2   2011-11-09 21:00:00
3   2011-11-09 22:00:00
4   2011-11-09 23:00:00
dtype: datetime64[ns]

使用并过滤:IntervalIndex.contains_Index.maprangeboolean indexing

working_hours = pd.date_range(pd.Timestamp(2011,11,9,7), 
                              pd.Timestamp(2011,11,9,23), freq='1H')

i = pd.IntervalIndex(df.reserved)
m = (working_hours.map(lambda x: not i.contains(x).any()))

s = working_hours.to_series()[m].reset_index(drop=True)
print (s)
0   2011-11-09 07:00:00
1   2011-11-09 12:00:00
2   2011-11-09 21:00:00
3   2011-11-09 22:00:00
4   2011-11-09 23:00:00
dtype: datetime64[ns]

推荐阅读