pandas - 尝试访问 Pandas DataFrame 的列时出现 KeyError
问题描述
这是我第一次使用 stackoverflow,所以如果我的问题没有遵循正确的约定,请原谅我。
我正在尝试创建一个函数来查找第一天最多乘客的车站,然后返回该车站每天的平均乘客人数。还要返回总体平均乘客量。但是,当我执行以下代码时,会引发如下所示的 KeyError 异常。请指教出了什么问题。非常感谢!
import pandas as pd
def mean_riders_for_max_station(ridership_df):
overall_mean = ridership_df.mean()
max_station = ridership_df.iloc[0].argmax()
mean_for_max = ridership_df[max_station].mean()
return (overall_mean, mean_for_max)
ridership_df = pd.DataFrame(
data=[[ 0, 0, 2, 5, 0],
[1478, 3877, 3674, 2328, 2539],
[1613, 4088, 3991, 6461, 2691],
[1560, 3392, 3826, 4787, 2613],
[1608, 4802, 3932, 4477, 2705],
[1576, 3933, 3909, 4979, 2685],
[ 95, 229, 255, 496, 201],
[ 2, 0, 1, 27, 0],
[1438, 3785, 3589, 4174, 2215],
[1342, 4043, 4009, 4665, 3033]],
index=['05-01-11', '05-02-11', '05-03-11', '05-04-11', '05-05-11',
'05-06-11', '05-07-11', '05-08-11', '05-09-11', '05-10-11'],
columns=['R003', 'R004', 'R005', 'R006', 'R007']
)
print(mean_riders_for_max_station(ridership_df))
我收到以下错误消息:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 3
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-23-60b53dc0106e> in <module>
37 )
38
---> 39 mean_riders_for_max_station(ridership_df)
<ipython-input-23-60b53dc0106e> in mean_riders_for_max_station(ridership_df)
17
18 max_station = ridership_df.iloc[0].argmax() #difference between argmax() for an array (--returning a location)
---> 19 mean_for_max = ridership_df[max_station].mean() #and argmax() for a series: returning index (or column name of the dataframe)
20 return (overall_mean, mean_for_max)
21
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: 3
解决方案
pandas Series的argmax()
方法返回最大值的位置(如数组中的整数索引)。
你想要的是max_station = ridership_df.iloc[0].idxmax()
.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.argmax.html
推荐阅读
- amazon-web-services - Terraform 错误 - ECS 使用 Spot 实例托管容器
- gatsby - 在 Gatsby 构建中获取一个值并将其公开给每个组件
- node.js - 使用猫鼬查询获取平均值
- windows - Umdf2 Hello World 驱动程序,在哪里查看输出?
- azure - 恢复意外删除的注册代理
- docker - Docker 桌面:在 Windows 10 中,注册层失败:重新执行错误:退出状态 1:输出:ProcessBaseLayer
- java - Java如何比较谓词
- php - 下载包含大量记录的 pdf
- .htaccess - 使用 GET 变量进行 301 重定向
- python - Django API 测试:csrf 豁免