pandas - 如何按来自不同 DF 的日期时间范围在 Pandas 中分组
问题描述
我卡住了,无法解决这个问题……我有 2 个数据框。一个有日期时间间隔,另一个有日期时间和值。我需要根据日期时间范围获取 MIN() 值。
import pandas as pd
timeseries = pd.DataFrame(
[
['2018-01-01T00:00:00.000000000', '2018-01-01T03:00:00.000000000'],
['2018-01-02T00:00:00.000000000', '2018-01-02T03:00:00.000000000'],
['2018-01-03T00:00:00.000000000', '2018-01-03T03:00:00.000000000'],
], dtype='datetime64[ns]', columns=['Start DT', 'End DT'])
values = pd.DataFrame(
[
['2018-01-01T00:00:00.000000000', 1],
['2018-01-01T01:00:00.000000000', 2],
['2018-01-01T02:00:00.000000000', 0],
['2018-01-02T00:00:00.000000000', -1],
['2018-01-02T01:00:00.000000000', 3],
['2018-01-02T02:00:00.000000000', 10],
['2018-01-03T00:00:00.000000000', 7],
['2018-01-03T01:00:00.000000000', 11],
['2018-01-03T02:00:00.000000000', 2],
], columns=['DT', 'Value'])
所需输出:
Start DT End DT Min
0 2018-01-01 2018-01-01 03:00:00 0
1 2018-01-02 2018-01-02 03:00:00 -1
2 2018-01-03 2018-01-03 03:00:00 2
和想法?
解决方案
使用IntervalIndex
按timeseries
列创建,然后按Index.get_indexer
、聚合min
和最后添加到列以获取位置timeseries
:
s = pd.IntervalIndex.from_arrays(timeseries['Start DT'],
timeseries['End DT'],
closed='both')
values['new'] = timeseries.index[s.get_indexer(values['DT'])]
print (values)
DT Value new
0 2018-01-01 00:00:00 1 0
1 2018-01-01 01:00:00 2 0
2 2018-01-01 02:00:00 0 0
3 2018-01-02 00:00:00 -1 1
4 2018-01-02 01:00:00 3 1
5 2018-01-02 02:00:00 10 1
6 2018-01-03 00:00:00 7 2
7 2018-01-03 01:00:00 11 2
8 2018-01-03 02:00:00 2 2
df = timeseries.join(values.groupby('new')['Value'].min().rename('Min'))
print (df)
Start DT End DT Min
0 2018-01-01 2018-01-01 03:00:00 0
1 2018-01-02 2018-01-02 03:00:00 -1
2 2018-01-03 2018-01-03 03:00:00 2
编辑:如果没有匹配添加缺失值-1
,则选择最后一个索引值,这里2
:
timeseries = pd.DataFrame(
[
['2018-01-01T00:00:00.000000000', '2018-01-01T03:00:00.000000000'],
['2018-01-02T00:00:00.000000000', '2018-01-02T03:00:00.000000000'],
['2018-01-03T00:00:00.000000000', '2018-01-03T03:00:00.000000000'],
], dtype='datetime64[ns]', columns=['Start DT', 'End DT'])
values = pd.DataFrame(
[ ['2017-12-31T00:00:00.000000000', -10],
['2018-01-01T00:00:00.000000000', 1],
['2018-01-01T01:00:00.000000000', 2],
['2018-01-01T02:00:00.000000000', 0],
['2018-01-02T00:00:00.000000000', -1],
['2018-01-02T01:00:00.000000000', 3],
['2018-01-02T02:00:00.000000000', 10],
['2018-01-03T00:00:00.000000000', 7],
['2018-01-03T01:00:00.000000000', 11],
['2018-01-03T02:00:00.000000000', 2],
], columns=['DT', 'Value'])
values['DT'] = pd.to_datetime(values['DT'])
print (values)
DT Value
0 2017-12-31 00:00:00 -10
1 2018-01-01 00:00:00 1
2 2018-01-01 01:00:00 2
3 2018-01-01 02:00:00 0
4 2018-01-02 00:00:00 -1
5 2018-01-02 01:00:00 3
6 2018-01-02 02:00:00 10
7 2018-01-03 00:00:00 7
8 2018-01-03 01:00:00 11
9 2018-01-03 02:00:00 2
s = pd.IntervalIndex.from_arrays(timeseries['Start DT'],
timeseries['End DT'], closed='both')
pos = s.get_indexer(values['DT'])
values['new'] = timeseries.index[pos].where(pos != -1)
print (values)
DT Value new
0 2017-12-31 00:00:00 -10 NaN
1 2018-01-01 00:00:00 1 0.0
2 2018-01-01 01:00:00 2 0.0
3 2018-01-01 02:00:00 0 0.0
4 2018-01-02 00:00:00 -1 1.0
5 2018-01-02 01:00:00 3 1.0
6 2018-01-02 02:00:00 10 1.0
7 2018-01-03 00:00:00 7 2.0
8 2018-01-03 01:00:00 11 2.0
9 2018-01-03 02:00:00 2 2.0
df = timeseries.join(values.dropna(subset=['new']).groupby('new')['Value'].min().rename('Min'))
print (df)
Start DT End DT Min
0 2018-01-01 2018-01-01 03:00:00 0
1 2018-01-02 2018-01-02 03:00:00 -1
2 2018-01-03 2018-01-03 03:00:00 2
推荐阅读
- html - What's the most optimal way of connecting divs with a line to represent a hierarchy?
- html - 用 position:absolute 和 position:relative 填充不工作
- javascript - React Native Navigation 子级父级
- image - 如何显示像gmail显示的图像?
- regex - 尝试匹配字符串时的正则表达式困境
- tensorflow2.0 - 创建 SparseCategoricalAccuracy 的修改版本,得到 ValueError: tf.function-decorated function tried to create variables on non-first call
- json - 如何使用 set_fact 打印数组变量以在 ansible add_host 模块中使用
- usb - 无需重启即可重启 USB 端口 (DevCon)
- azure-databricks - Azure Databricks 到 Azure SQL DW 连接
- javascript - 我不能调用 this.function: TypeError: this.fetchData is not a function