首页 > 解决方案 > `pandas` 中的类日期时间索引

问题描述

我有一个加载了累积降雨时间序列的 DataFrame:

df = pd.read_csv(csv_file, parse_dates=[['date', 'time']], dayfirst=True, index_col=0)

(我无法共享源数据,它是通过适配器对象读取的,该对象将数据呈现为具有 .csv 内容的文本文件read_csv,尽管源文件采用某种专有格式,但它与问题无关, 最终结果是带有日期时间索引和浮点值的 DataFrame - 日期可能是虚拟的)

然后将降雨量重新采样转换为分钟:

rainfall_differences = df['rainfall'].diff()
rainfall_differences = rainfall_differences.resample('1min', label='right', closed='right').sum()

所有这些都按预期工作。但是,我的问题是关于这两个陈述之间的区别:

x = rainfall_differences.rolling('90min').sum()
y = rainfall_differences.rolling('1.5h').sum()

第一个有效,但第二个抛出异常:

  File "<<path>>/my_file.py", line 68, in load_rainfalls
    result[duration_label] = rainfall_differences.rolling(duration_label).sum()
  File "<<path>>\lib\site-packages\pandas\core\generic.py", line 10386, in rolling
    closed=closed,
  File "<<path>>\lib\site-packages\pandas\core\window\rolling.py", line 94, in __init__
    self.validate()
  File "<<path>>\lib\site-packages\pandas\core\window\rolling.py", line 1836, in validate
    freq = self._validate_freq()
  File "<<path>>\lib\site-packages\pandas\core\window\rolling.py", line 1888, in _validate_freq
    f"passed window {self.window} is not "
ValueError: passed window 1.5h is not compatible with a datetimelike index

我的问题:为什么一个窗口'90min'会与 datetimelike 索引兼容,当一个窗口'1.5h'不是时,即使两者Timedelta('0 days 01:30:00')在传递给时都评估为相同的值pandas.to_timedelta()

注意:我知道如何解决/解决这个问题,但这不是我的问题。我想知道为什么这甚至是必需的。例如:

index_duration = str(int(pd.to_timedelta('1.5 hour').total_seconds() / 60)) + 'min'
y = rainfall_differences.rolling(index_duration).sum()

标签: pythonpandas

解决方案


我认为有必要更改hH

y = rainfall_differences.rolling('1.5H').sum()

我认为原因是因为无效offset alias

Alias   Description
H       hourly frequency
T, min  minutely frequency
S       secondly frequency

样品

rng = pd.date_range('2017-04-03', periods=5, freq='10T')
rainfall_differences = pd.DataFrame({'a': range(5)}, index=rng)  
print (rainfall_differences)
                     a
2017-04-03 00:00:00  0
2017-04-03 00:10:00  1
2017-04-03 00:20:00  2
2017-04-03 00:30:00  3
2017-04-03 00:40:00  4

y = rainfall_differences.rolling('1.5H').sum()
print (y)
                        a
2017-04-03 00:00:00   0.0
2017-04-03 00:10:00   1.0
2017-04-03 00:20:00   3.0
2017-04-03 00:30:00   6.0
2017-04-03 00:40:00  10.0

推荐阅读