首页 > 解决方案 > 尝试访问 Pandas DataFrame 的列时出现 KeyError

问题描述

这是我第一次使用 stackoverflow,所以如果我的问题没有遵循正确的约定,请原谅我。

我正在尝试创建一个函数来查找第一天最多乘客的车站,然后返回该车站每天的平均乘客人数。还要返回总体平均乘客量。但是,当我执行以下代码时,会引发如下所示的 KeyError 异常。请指教出了什么问题。非常感谢!

import pandas as pd

def mean_riders_for_max_station(ridership_df):
    
    overall_mean = ridership_df.mean()

    max_station = ridership_df.iloc[0].argmax()  
    mean_for_max = ridership_df[max_station].mean() 
    return (overall_mean, mean_for_max)

ridership_df = pd.DataFrame(
    data=[[   0,    0,    2,    5,    0],
          [1478, 3877, 3674, 2328, 2539],
          [1613, 4088, 3991, 6461, 2691],
          [1560, 3392, 3826, 4787, 2613],
          [1608, 4802, 3932, 4477, 2705],
          [1576, 3933, 3909, 4979, 2685],
          [  95,  229,  255,  496,  201],
          [   2,    0,    1,   27,    0],
          [1438, 3785, 3589, 4174, 2215],
          [1342, 4043, 4009, 4665, 3033]],
    index=['05-01-11', '05-02-11', '05-03-11', '05-04-11', '05-05-11',
           '05-06-11', '05-07-11', '05-08-11', '05-09-11', '05-10-11'],
    columns=['R003', 'R004', 'R005', 'R006', 'R007']
)

print(mean_riders_for_max_station(ridership_df))

我收到以下错误消息:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 3

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-23-60b53dc0106e> in <module>
     37 )
     38 
---> 39 mean_riders_for_max_station(ridership_df)

<ipython-input-23-60b53dc0106e> in mean_riders_for_max_station(ridership_df)
     17 
     18     max_station = ridership_df.iloc[0].argmax()   #difference between argmax() for an array (--returning a location)
---> 19     mean_for_max = ridership_df[max_station].mean() #and argmax() for a series: returning index (or column name of the dataframe)
     20     return (overall_mean, mean_for_max)
     21 

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2900             if self.columns.nlevels > 1:
   2901                 return self._getitem_multilevel(key)
-> 2902             indexer = self.columns.get_loc(key)
   2903             if is_integer(indexer):
   2904                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
-> 2897                 raise KeyError(key) from err
   2898 
   2899         if tolerance is not None:

KeyError: 3

标签: pandasdataframekeyerrorargmax

解决方案


pandas Series的argmax()方法返回最大值的位置(如数组中的整数索引)。

你想要的是max_station = ridership_df.iloc[0].idxmax().

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.argmax.html


推荐阅读