python - Convert `pandas` frequency string to `DateOffset`
问题描述
I have a timezone-aware pandas
DateTimeIndex
, which I would like to advance by one timestep, with the timestep as specified by its .freq
attribute. However, doing this does not respect the time zone information:
import pandas as pd
i = pd.date_range('2020-03-28', freq='D', periods=3, tz='Europe/Amsterdam')
# DatetimeIndex(['2020-03-28 00:00:00+01:00', '2020-03-29 00:00:00+01:00',
# '2020-03-30 00:00:00+02:00'],
# dtype='datetime64[ns, Europe/Amsterdam]', freq='D')
i + i.freq
# Not what I want; second timestamp is advanced by 24h instead of 23h and is no longer at midnight:
# DatetimeIndex(['2020-03-29 00:00:00+01:00', '2020-03-30 01:00:00+02:00',
# '2020-03-31 00:00:00+02:00'],
# dtype='datetime64[ns, Europe/Amsterdam]', freq='D')
What does work is using pd.DateOffset
:
i + pd.DateOffset(days=1)
# What I want; all timestamps at midnight (I just need to re-set the .freq attribute):
# DatetimeIndex(['2020-03-29 00:00:00+01:00', '2020-03-30 00:00:00+02:00',
# '2020-03-31 00:00:00+02:00'],
# dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
However, as I don't know in advance what the frequency of the index will be, I'd like to use the value of i.freq
to get the correct DateOffset
. Is there a way to do this? (Apart from using a long if... elif... elif...
block.)
Other solutions also welcome, of course.
This is the only other question related to this that I found, but I cannot use it here:
i + pd.tseries.frequencies.to_offset(i.freq)
# Not what I want:
# DatetimeIndex(['2020-03-29 00:00:00+01:00', '2020-03-30 01:00:00+02:00',
# '2020-03-31 00:00:00+02:00'],
# dtype='datetime64[ns, Europe/Amsterdam]', freq='D')
(In fact, the latter term returns exactly i.freq
.)
Many thanks.
EDIT (1)
As suggested in the comments, using .shift(1)
works in some cases, including in my stated case above...
i.shift(1)
# What I want; all timestamps at midnight:
# DatetimeIndex(['2020-03-29 00:00:00+01:00', '2020-03-30 00:00:00+02:00',
# '2020-03-31 00:00:00+02:00'],
# dtype='datetime64[ns, Europe/Amsterdam]', freq='D')
...but not in all. In fact, advancing the start date in my original index by one day causes a timestamp to get dropped, and the remaining ones are wrong:
i2 = pd.date_range('2020-03-29', freq='D', periods=3, tz='Europe/Amsterdam')
# DatetimeIndex(['2020-03-29 00:00:00+01:00', '2020-03-30 00:00:00+02:00',
# '2020-03-31 00:00:00+02:00'],
# dtype='datetime64[ns, Europe/Amsterdam]', freq='D')
i2.shift(1)
# Not what I want: timestamps not at midnight, and one got dropped!
# DatetimeIndex(['2020-03-30 01:00:00+02:00', '2020-03-31 01:00:00+02:00'],
# dtype='datetime64[ns, Europe/Amsterdam]', freq='D')
EDIT (2)
As suggested in the answer by @MrFruppes, using the .nanos
attribute of i.freq
works as input to pd.DateOffset
...
i + pd.DateOffset(nanoseconds=i.freq.nanos)
# What I want; all timestamps at midnight (I just need to re-set the .freq attribute):
# DatetimeIndex(['2020-03-29 00:00:00+01:00', '2020-03-30 00:00:00+02:00',
# '2020-03-31 00:00:00+02:00'],
# dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
... but it breaks when we try to advance to the beginning of next month:
i3 = pd.date_range('2020-03-01', freq='MS', periods=3, tz='Europe/Amsterdam')
# DatetimeIndex(['2020-03-01 00:00:00+01:00', '2020-04-01 00:00:00+02:00',
# '2020-05-01 00:00:00+02:00'],
# dtype='datetime64[ns, Europe/Amsterdam]', freq='MS')
i3 + pd.DateOffset(nanoseconds=i3.freq.nanos)
Traceback (most recent call last):
File "<ipython-input-58-f3a32c654a6e>", line 1, in <module>
i3 + pd.DateOffset(nanoseconds=i3.freq.nanos)
File "pandas\_libs\tslibs\offsets.pyx", line 690, in pandas._libs.tslibs.offsets.BaseOffset.nanos.__get__
ValueError: <MonthBegin> is a non-fixed frequency
解决方案
If you have a fixed frequency, you can use the nanos
property of the freq. Ex:
import pandas as pd
i = pd.date_range('2020-03-29', freq='D', periods=3, tz='Europe/Amsterdam')
# DatetimeIndex(['2020-03-29 00:00:00+01:00', '2020-03-30 00:00:00+02:00',
# '2020-03-31 00:00:00+02:00'],
# dtype='datetime64[ns, Europe/Amsterdam]', freq='D')
i + pd.DateOffset(nanoseconds=i.freq.nanos)
# DatetimeIndex(['2020-03-30 00:00:00+02:00', '2020-03-31 00:00:00+02:00',
# '2020-04-01 00:00:00+02:00'],
# dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
推荐阅读
- json - 使用 jq 检查 JSON 数据类型
- vb.net - 从另一个类设置私有变量的值
- python - 如何获得按钮的背景颜色?
- python - 如何使用 Popen 同时运行多个具有 while(1) 循环的程序
- google-sheets - 如何在不弄乱链接的情况下用新的 gDoc (google sheet) 替换旧副本?
- java - 使用重播(selectorFoo)但不发布(selectorFoo)时出现OOM
- asp.net - 使用 Angular 7 通过服务保存文档
- python - df.loc[anything].index 和 iloc 有什么区别?
- angular - Angular / RxJS 多播 Observable 的 Observable
- python - 获取 TypeError:无法将“int”对象隐式转换为 str