首页 > 解决方案 > 尝试在 pandas DataFrame 上应用函数来计算分数?

问题描述

我创建了如下给出的用户定义函数并尝试应用于 DataFrame,但出现错误:-“TypeError: ("scoreq() missing 3 required positional arguments: 'ADVTG_TRGT_INC', 'AGECD', and 'PPXPRI'", 'occurred在索引 ADVNTG_MARITAL_STAT')"

def scoreq(PCT_NO_OPEN_TRDLN, ADVTG_TRGT_INC, AGECD, PPXPRI):
        scoreq += -0.3657
        scoreq += (ADVNTG_MARITAL_STAT in ('2'))*-0.039
        scoreq += (ADVTG_TRGT_INC in ('7','6','5','4'))*0.1311
        scoreq += (AGECD in ('7','2'))*-0.1254
        scoreq += (PPXPRI in (-1))*-0.1786
        return scoreq
        
df_3Var['scoreq'] = df_3Var.apply(scoreq)

"TypeError: ("scoreq() missing 3 required positional arguments: 'ADVTG_TRGT_INC', 'AGECD', and 'PPXPRI'", 'occurred at index ADVNTG_MARITAL_STAT')"
 


df_3Var:- 
    ADVNTG_MARITAL_STAT   ADVTG_TRGT_INC    AGECD   PPXPRI
0                     1                5        6       -1
1                     2                6        5       -1
2                     1                2        2       -1
3                     2                7        6      133
4                     2                1        3       75

标签: pythonpandasjupyter-notebookuser-defined-functions

解决方案


您在函数中使用了列名作为参数scoreq,但这不是它的工作方式。它应该接收常规参数。

您有两个选择:将整行发送到scoreq,或仅发送相关值:

def scoreq(row):
        scoreq = row["...."]
        ...
        return scoreq

df_3Var['scoreq'] = df_3Var.apply(scoreq)

或直接仅发送值:

df_3Var['scoreq'] = df_3Var.apply(lambda row: scoreq(row["..."], row["..."]))

此外,您可能希望将scoreq函数内部的数字作为数字而不是字符串处理:例如scoreq += (row["PPXPRI"]==(-1))*-0.1786,而不是in


推荐阅读