首页 > 解决方案 > 将行传递给函数给出错误 Pandas Python

问题描述

我正在尝试创建一个新列,其中填充的值将在比较数据框的两列之后。这是我尝试过的:

def determinecolor(row,column1,column2):
    if row[column1] == row[column2]:
        val = 'k'
    elif row[column1] > row[column2]:
        val = 'r'
    else:
        val = 'g'
    return val
datasetTest['color_original'] = datasetTest.apply(determinecolor(datasetTest,'openshifted','close'), axis=1)

我收到的错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-182-31188e414958> in <module>()
      2 # if(test_shifted['openshifted'][0] > test_pred_list[0]): print("red")
      3 datasetTest.loc[:,'predict_close'] = pd.Series(test_pred_list)
----> 4 datasetTest['color_original'] = datasetTest.apply(determinecolor(datasetTest,'openshifted','close'), axis=1)
      5 
      6 # datasetTest['color_predicted'] = datasetTest.apply(determinePredictedcolor, axis=1)

<ipython-input-178-d1f3e204fd17> in determinecolor(row, column1, column2)
      1 def determinecolor(row,column1,column2):
----> 2     if row[column1] == row[column2]:
      3         val = 'k'
      4     elif row[column1] > row[column2]:
      5         val = 'r'

c:\python35\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1119         raise ValueError("The truth value of a {0} is ambiguous. "
   1120                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1121                          .format(self.__class__.__name__))
   1122 
   1123     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

请帮我解决这个问题。

编辑
这里是一个示例数据集:

open    high    low close   closeTarget openshifted predict_close
0.104167    0.119048    0.117647    0.145833    0.104167    0.416667    0.881613
0.416667    0.285714    0   0.104167    0.4375  0.833333    0.684905
0.833333    0.761905    0.45098 0.4375  0.791667    0.8125  0.821244
0.8125  0.761905    0.784314    0.791667    0.770833    0.8125  0.920608
0.8125  0.761905    0.823529    0.770833    0.8125  0.916667    0.853668

标签: pythonpython-3.xpandas

解决方案


您不应该pd.DataFrame.apply用于可矢量化操作。

您可以numpy.select改为提供条件和值列表,以及所有其他场景的默认值:

conditions = [df['col1'] == df['col2'], df['col1'] > df['col2']]
values = ['k', 'r']

df['color_original'] = np.select(conditions, values, 'g')

错误的原因是您误pd.DataFrame.apply用了 ,它将每一行传递给一个函数(使用axis=1)。您不需要将数据框作为参数显式传递:

df['color_original'] = df.apply(determinecolor, column1='openshifted',
                                column2='close', axis=1)

推荐阅读