首页 > 解决方案 > Pandas read_csv() 可以很好地解析日期,但不能按日期索引

问题描述

这很奇怪。

数据(csv):

Date,    Hr 1,Hr 2,Hr 3,..
20070701,1128,1072,1173,..
20070702,1131,1092,1287,..

pd.read_csv() 的普通用法:

df = pd.read_csv(   filename,
                    parse_dates=['Date'],
                    index_col=['Date'])

日期似乎可以很好地解析到索引中:

print(df.index[:2])

输出:

DatetimeIndex(['2007-07-01', '2007-07-02'], dtype='datetime64[ns]', name='Date', freq=None)

现在,如果我尝试索引一天?

print(df['2007-7-1']) # or any variation on "2007-07-01" etc

输出:

Traceback (most recent call last):
  File "/Users/mjw/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '2007-7-1'
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "my_file.py", line 108, in <module>
    print(df['2007-7-1'])
  File "/Users/mjw/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Users/mjw/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '2007-7-1'

我还尝试确保 DatetimeIndex 频率设置正确

df = df.asfreq('d')

我得到同样的垃圾。

但是按年和月索引可以正常工作,或者在选择列后按年-月-日索引:

print(df['2007-7']) # works
print(df['Hr 1']['2007-7-1']) # works

但这不会:

print(df['2007-7-1']['Hr 1'])

我可以制作一个自定义日期解析器,但重点是我不应该这样做。“yyyymmdd”并不难或不寻常。来大熊猫吧

谢谢,麻烦您了!

标签: pythonpandascsvdatetime

解决方案


使用.loc

print(df.loc["2007-07-01"])

印刷:

    Hr 1    1128
Hr 2        1072
Hr 3        1173
Name: 2007-07-01 00:00:00, dtype: int64

对于“Hr 2”列的值:

print(df.loc["2007-07-01", "Hr 2"])

印刷:

1072

推荐阅读