首页 > 解决方案 > Python,pd数据框根据条件提取值引发错误

问题描述

我有以下 nba 球员统计数据框:

print(self.df)


                                                      Name   PTS   REB  AST  \
(updated to: , 2020-02-24 19:39:00)                                           
0                                             James Harden  35.2   6.4  7.4   
1                                    Giannis Antetokounmpo  30.0  13.6  5.8   
2                                               Trae Young  30.0   4.4  9.2   
3                                             Bradley Beal  29.6   4.4  6.0   
4                                           Damian Lillard  29.5   4.4  7.9   
...                                                    ...   ...   ...  ...   
261                                        Jerome Robinson   3.1   1.7  1.1   
262                                           Goga Bitadze   3.1   2.0  0.5   
263                                          Javonte Green   3.0   1.7  0.5   
264                                           Semi Ojeleye   2.9   1.9  0.5   
265                                    Matthew Dellavedova   2.5   1.1  2.6   

                                     STL  BLK   FGM   FGA   FG%  3PM   3PA  \
(updated to: , 2020-02-24 19:39:00)                                          
0                                    1.7  1.0  10.1  23.1  43.9  4.6  12.8   
1                                    1.1  1.1  11.1  20.1  55.2  1.5   4.8   
2                                    1.2  0.1   9.3  20.8  44.5  3.5   9.5   
3                                    1.1  0.4  10.1  22.2  45.3  2.6   8.0   
4                                    1.0  0.3   9.4  20.4  46.0  3.9  10.0   
...                                  ...  ...   ...   ...   ...  ...   ...   
261                                  0.3  0.2   1.2   3.5  34.1  0.5   1.7   
262                                  0.1  0.7   1.3   2.6  48.2  0.1   0.6   
263                                  0.5  0.1   1.2   2.3  51.1  0.1   0.6   
264                                  0.3  0.1   1.0   2.4  39.5  0.5   1.5   
265                                  0.3  0.0   0.9   2.7  32.3  0.2   1.4   

                                      3P%   FTM   FTA   FT%  
(updated to: , 2020-02-24 19:39:00)                          
0                                    35.9  10.4  12.0  86.8  
1                                    31.1   6.4  10.4  61.5  
2                                    37.4   7.9   9.3  85.5  
3                                    32.0   6.9   8.1  84.4  
4                                    39.3   6.8   7.7  88.9  
...                                   ...   ...   ...   ...  
261                                  29.5   0.3   0.4  57.1  
262                                  15.4   0.5   0.7  69.0  
263                                  26.1   0.6   0.9  63.9  
264                                  35.0   0.5   0.5  88.9  
265                                  15.9   0.5   0.6  89.3  

[266 rows x 15 columns]

我试图通过缩小 df 来分析一些统计数据,并获得高于两列平均值的所有统计数据,当尝试根据条件提取一些值时,我收到以下错误。

    def get_stat(self):
        pts_fgm_df = self.df.head(n=120)
        rslt_df = pts_fgm_df.loc[pts_fgm_df['PTS'] > pts_fgm_df['PTS'].mean() & pts_fgm_df['FG%'] > pts_fgm_df.mean()]
        print(rslt_df)
TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

标签: pythonpandas

解决方案


我的解决方案可能性:

        top_df = self.df.head(n=120)
        mean_pts = top_df['PTS'].mean()
        mean_fgp = top_df['FG%'].mean()
        rslt_df = top_df[
            (top_df['PTS'] >= mean_pts) &
            (top_df['FG%'] >= mean_fgp)
            ]
        return rslt_df

我的问题是当我写的逻辑看不清楚。

# So the solution is to first give every statement a variable name.
mean_pts = top_df['PTS'].mean()
mean_fgp = top_df['FG%'].mean()
pts = top_df['PTS']
fgp = top_df['FG%']

然后根据它们过滤:

# Which makes this a lot clearer to see missing brackets and such.
rslt_df = top_df[
            (pts >= mean_pts) &
            (fgp >= mean_fgp)
            ]
return rslt_df

推荐阅读