python - Pandas read_csv() 可以很好地解析日期,但不能按日期索引
问题描述
这很奇怪。
数据(csv):
Date, Hr 1,Hr 2,Hr 3,..
20070701,1128,1072,1173,..
20070702,1131,1092,1287,..
pd.read_csv() 的普通用法:
df = pd.read_csv( filename,
parse_dates=['Date'],
index_col=['Date'])
日期似乎可以很好地解析到索引中:
print(df.index[:2])
输出:
DatetimeIndex(['2007-07-01', '2007-07-02'], dtype='datetime64[ns]', name='Date', freq=None)
现在,如果我尝试索引一天?
print(df['2007-7-1']) # or any variation on "2007-07-01" etc
输出:
Traceback (most recent call last):
File "/Users/mjw/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '2007-7-1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "my_file.py", line 108, in <module>
print(df['2007-7-1'])
File "/Users/mjw/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "/Users/mjw/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '2007-7-1'
我还尝试确保 DatetimeIndex 频率设置正确
df = df.asfreq('d')
我得到同样的垃圾。
但是按年和月索引可以正常工作,或者在选择列后按年-月-日索引:
print(df['2007-7']) # works
print(df['Hr 1']['2007-7-1']) # works
但这不会:
print(df['2007-7-1']['Hr 1'])
我可以制作一个自定义日期解析器,但重点是我不应该这样做。“yyyymmdd”并不难或不寻常。来大熊猫吧
谢谢,麻烦您了!
解决方案
使用.loc
:
print(df.loc["2007-07-01"])
印刷:
Hr 1 1128
Hr 2 1072
Hr 3 1173
Name: 2007-07-01 00:00:00, dtype: int64
对于“Hr 2”列的值:
print(df.loc["2007-07-01", "Hr 2"])
印刷:
1072
推荐阅读
- java - 如何将以下代码转换为 lamda 表达式
- linux-kernel - bootloader的UART驱动只能用于bootloader,不能用于Linux
- java - 正则表达式替换预定义的字符范围之前的所有内容 - Java
- pointers - fmt.Printf() 格式说明符以默认格式打印指向结构的指针?
- azure - 具有承载身份验证的 Azure SignalR 服务
- azure - 在 statefulset 中重新启动 pod 时创建一个新卷
- c# - 我如何检查某物是否包含 XY,但它也可能是 XZY?
- oracle - 如何在 PL/SQL Oracle 中测试函数
- trumbowyg - 通过下拉按钮为 Trumbowyg 插入预定义文本
- apache-spark - 如何在 Spark 2.4.0 中使用 PySpark API 将表插入 Hive