python - DateTimeIndex.to_period 为许多偏移别名引发 ValueError 异常
问题描述
我正在尝试解决一个非常简单的问题,但我遇到了困难。我有一个基于简单数据框的 DateTimeIndex,如下所示:
df=pd.DataFrame(
index=pd.date_range(
start='2017-01-01',
end='2017-03-04', closed=None),
data=np.arange(63), columns=['val']).rename_axis(index='date')
In [179]: df
Out[179]:
val
date
2017-01-01 0
2017-01-02 1
2017-01-03 2
2017-01-04 3
2017-01-05 4
... ...
2017-02-28 58
2017-03-01 59
2017-03-02 60
2017-03-03 61
2017-03-04 62
[63 rows x 1 columns]
我希望按每周、半月、每月等时间段来总结这些值。所以我尝试了:
In [180]: df.to_period('W').groupby('date').sum()
Out[180]:
val
date
2016-12-26/2017-01-01 0
2017-01-02/2017-01-08 28
2017-01-09/2017-01-15 77
2017-01-16/2017-01-22 126
2017-01-23/2017-01-29 175
2017-01-30/2017-02-05 224
2017-02-06/2017-02-12 273
2017-02-13/2017-02-19 322
2017-02-20/2017-02-26 371
2017-02-27/2017-03-05 357
这适用于 Y、M、D、W、T、S、L、U、N 等偏移别名。但对于此处列出的 SM、SMS 和其他别名则失败:https ://pandas.pydata.org/pandas-docs/稳定/user_guide/timeseries.html#offset-aliases
它引发了一个 ValueError 异常:
In [181]: df.to_period('SMS').groupby('date').sum() --------------------------------------------------------------------------- KeyError Traceback (most recent call last) pandas/_libs/tslibs/frequencies.pyx in pandas._libs.tslibs.frequencies._period_str_to_code() KeyError: 'SMS-15' During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) <ipython-input-181-6779559a0596> in <module> ----> 1 df.to_period('SMS').groupby('date').sum() ~/.virtualenvs/py36/lib/python3.6/site-packages/pandas/core/frame.py in to_period(self, freq, axis, copy) 8350 axis = self._get_axis_number(axis) 8351 if axis == 0: -> 8352 new_data.set_axis(1, self.index.to_period(freq=freq)) 8353 elif axis == 1: 8354 new_data.set_axis(0, self.columns.to_period(freq=freq)) ~/.virtualenvs/py36/lib/python3.6/site-packages/pandas/core/accessor.py in f(self, *args, **kwargs) 91 def _create_delegator_method(name): 92 def f(self, *args, **kwargs): ---> 93 return self._delegate_method(name, *args, **kwargs) 94 95 f.__name__ = name ~/.virtualenvs/py36/lib/python3.6/site-packages/pandas/core/indexes/datetimelike.py in _delegate_method(self, name, *args, **kwargs) 811 812 def _delegate_method(self, name, *args, **kwargs): --> 813 result = operator.methodcaller(name, *args, **kwargs)(self._data) 814 if name not in self._raw_methods: 815 result = Index(result, name=self.name) ~/.virtualenvs/py36/lib/python3.6/site-packages/pandas/core/arrays/datetimes.py in to_period(self, freq) 1280 freq = get_period_alias(freq) 1281 -> 1282 return PeriodArray._from_datetime64(self._data, freq, tz=self.tz) 1283 1284 def to_perioddelta(self, freq): ~/.virtualenvs/py36/lib/python3.6/site-packages/pandas/core/arrays/period.py in _from_datetime64(cls, data, freq, tz) 273 PeriodArray[freq] 274 """ --> 275 data, freq = dt64arr_to_periodarr(data, freq, tz) 276 return cls(data, freq=freq) 277 ~/.virtualenvs/py36/lib/python3.6/site-packages/pandas/core/arrays/period.py in dt64arr_to_periodarr(data, freq, tz) 914 data = data._values 915 --> 916 base, mult = libfrequencies.get_freq_code(freq) 917 return libperiod.dt64arr_to_periodarr(data.view("i8"), base, tz), freq 918 pandas/_libs/tslibs/frequencies.pyx in pandas._libs.tslibs.frequencies.get_freq_code() pandas/_libs/tslibs/frequencies.pyx in pandas._libs.tslibs.frequencies.get_freq_code() pandas/_libs/tslibs/frequencies.pyx in pandas._libs.tslibs.frequencies.get_freq_code() pandas/_libs/tslibs/frequencies.pyx in pandas._libs.tslibs.frequencies._period_str_to_code() ValueError: Invalid frequency: SMS-15
我正在使用 python 3.6.5,熊猫版本 '0.25.1'
解决方案
在这里使用DataFrame.resample
:
print (df.resample('W').sum())
val
date
2017-01-01 0
2017-01-08 28
2017-01-15 77
2017-01-22 126
2017-01-29 175
2017-02-05 224
2017-02-12 273
2017-02-19 322
2017-02-26 371
2017-03-05 357
print (df.resample('SM').sum())
val
date
2016-12-31 91
2017-01-15 344
2017-01-31 555
2017-02-15 663
2017-02-28 300
print (df.resample('SMS').sum())
val
date
2017-01-01 91
2017-01-15 374
2017-02-01 525
2017-02-15 721
2017-03-01 242
groupby
与和的替代品Grouper
:
print (df.groupby(pd.Grouper(freq='W')).sum())
print (df.groupby(pd.Grouper(freq='SM')).sum())
print (df.groupby(pd.Grouper(freq='SMS')).sum())
推荐阅读
- z3 - 如何在 circleci 中安装最新的 z3 版本?
- lua - Lua:为什么会出现错误:'=' 预期在 'bool' 附近?
- python - 无法匹配来自 django 中 get_user_model 的字段
- php - 转换为 PHP 的 01-01-2039 之后的日期显示为 01-01-1970
- flutter - 实时搜索:用户输入完毕后开始搜索
- python - 机器学习模型 Python 的 train-est 拆分中的列不同
- c++ - 为什么这个模板扣除失败?
- sql - 查询以确定 JSON 值是否包含指定的属性
- html - 在一个部门中居中项目 (CSS)
- php - 每页 ajax laravel 的许多表单