首页 > 解决方案 > 数据框计算列不返回数值

问题描述

我有一个头部看起来像的数据框(df):

  Quarter   Body    Total requests  Requests on-hold    Total requests received (excluding on-hold)
1 2019_Q3      A                93                 5    
2 2019_Q3      B               228                 2    
3 2019_Q3      C               180                 7    
4 2019_Q3      D                31                 3    
5 2019_Q3      E               555                 0    

每个字段的类型是:

df.dtypes  
Quarter                                                                                         object
Body                                                                                            object
Total requests                                                                                  object
Requests Processed                                                                              object
Requests on-hold                                                                                object
Total requests received (excluding on-hold)                                                    float64

我正在尝试计算Total requests - Requests on-hold并将结果插入列中Total requests received (excluding on-hold),因此我想要的输出如下所示:

  Quarter   Body    Total requests  Requests on-hold    Total requests received (excluding on-hold)
1 2019_Q3      A                93                 5                                            88
2 2019_Q3      B               228                 2                                           226
3 2019_Q3      C               180                 7                                           173
4 2019_Q3      D                31                 3                                            28
5 2019_Q3      E               555                 0                                           555

我正在尝试使用以下方法创建收到的总请求数(不包括保留)列数据:

df['Total requests received (excluding on-hold)'] = df['Total requests'] - df['Requests on-hold']

但我得到NaN每个条目而不是一个值

  Quarter   Body    Total requests  Requests on-hold    Total requests received (excluding on-hold)
1 2019_Q3      A                93                 5                                           NaN
2 2019_Q3      B               228                 2                                           NaN
3 2019_Q3      C               180                 7                                           NaN
4 2019_Q3      D                31                 3                                           NaN
5 2019_Q3      E               555                 0                                           NaN

我注意到 Total requests 和 Requests on-hold 的类型是 object 所以我尝试使用转换为数字

df["Total requests"] = pd.to_numeric(df["Total requests"])
df["Requests on-hold"] = pd.to_numeric(df["Requests on-hold"])

没有成功。我该如何解决这个问题?

注意:当我添加以下代码以转换为数字时(在计算之前):

df["Total requests"] = pd.to_numeric(df["Total requests"])
df["Requests on-hold"] = pd.to_numeric(df["Requests on-hold"])

我收到错误:

    df["Total requests"] = pd.to_numeric(df["Total requests"])

  File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\pandas\core\tools\numeric.py", line 122, in to_numeric
    raise TypeError('arg must be a list, tuple, 1-d array, or Series')

TypeError: arg must be a list, tuple, 1-d array, or Series

标签: pythonpandas

解决方案


似乎有一些空格,所以尝试通过以下方式删除它strip

df["Total requests"] = pd.to_numeric(df["Total requests"].str.strip())
df["Requests on-hold"] = pd.to_numeric(df["Requests on-hold"].str.strip())

如果可能的话,首先使用一些带有尾随值的非数值strip,然后添加参数errors='coerce'以将它们转换为NaNs:

df["Total requests"] = pd.to_numeric(df["Total requests"].str.strip(), errors='coerce')
df["Requests on-hold"] = pd.to_numeric(df["Requests on-hold"].str.strip(), errors='coerce')

如果不是尾随空格:

df["Total requests"] = pd.to_numeric(df["Total requests"], errors='coerce')
df["Requests on-hold"] = pd.to_numeric(df["Requests on-hold"], errors='coerce')

推荐阅读