首页 > 解决方案 > 将频率“MS”设置为熊猫数据时间对象 - python

问题描述

我在熊猫中有这个数据框:

df = pd.read_csv('data_stack.csv',index_col='month',parse_dates=True)

在此处输入图像描述

如果我查看参数freq,它会自动推断为 None

DatetimeIndex(['2018-09-01', '2018-08-01', '2018-07-01', '2018-06-01',
               '2018-05-01', '2018-04-01', '2018-03-01', '2018-02-01',
               '2018-01-01', '2017-12-01',
               ...
               '2018-11-01', '2019-01-01', '2018-12-01', '2018-11-01',
               '2019-01-01', '2018-12-01', '2018-11-01', '2019-01-01',
               '2018-12-01', '2018-11-01'],
              dtype='datetime64[ns]', name='month', length=4325, freq=None)

我想把它作为每月开始的“MS”:

df.index.freq = 'MS'

但我收到此错误:

ValueError                                Traceback (most recent call last)
<ipython-input-99-0dc1e7b74d6b> in <module>
----> 1 df.index.freq = 'MS'

~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/extension.py in fset(self, value)
     64 
     65             def fset(self, value):
---> 66                 setattr(self._data, name, value)
     67 
     68             fget.__name__ = name

~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/arrays/datetimelike.py in freq(self, value)
    925         if value is not None:
    926             value = frequencies.to_offset(value)
--> 927             self._validate_frequency(self, value)
    928 
    929         self._freq = value

~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/arrays/datetimelike.py in _validate_frequency(cls, index, freq, **kwargs)
   1001             #  message.
   1002             raise ValueError(
-> 1003                 f"Inferred frequency {inferred} from passed values "
   1004                 f"does not conform to passed frequency {freq.freqstr}"
   1005             )

ValueError: Inferred frequency None from passed values does not conform to passed frequency MS

我找过类似的案例,我发现了这个:pandas.DatetimeIndex frequency is None and can't be set

我试过了,但我得到了同样的错误,谁能告诉我为什么?

数据在这个存储库中:https ://github.com/jordi-crespo/stack-questions

标签: pythonpandas

解决方案


没有频率,因为您的索引中有重复的值。所以我想你可以用这样一个索引来设置频率的唯一事情就是以某种方式聚合数据,例如

>>> df.resample('MS').mean().index
DatetimeIndex(['2017-01-01', '2017-02-01', '2017-03-01', '2017-04-01',
               '2017-05-01', '2017-06-01', '2017-07-01', '2017-08-01',
               '2017-09-01', '2017-10-01', '2017-11-01', '2017-12-01',
               '2018-01-01', '2018-02-01', '2018-03-01', '2018-04-01',
               '2018-05-01', '2018-06-01', '2018-07-01', '2018-08-01',
               '2018-09-01', '2018-10-01', '2018-11-01', '2018-12-01',
               '2019-01-01'],
              dtype='datetime64[ns]', name='month', freq='MS')

它为您提供所需频率的索引。但我不确定这是否是你真正想要的。


推荐阅读