首页 > 解决方案 > 根据另一列中的值选择列的值,然后在熊猫数据框中的每一行应用该值

问题描述

我必须计算 S 的值,其公式为:S = (25400/CN) − 254

我必须选择的 CN 值取决于 amc_active 条件,即 1、2 和 3。如果“索引 0 或第一行”处的 amc_active 条件为 1,那么我必须从 cn1 列中选择 CN 值,即 47

如果 amc_active 为 3,那么我必须从第 4 行的 cn3 列中选择 CN 值为 95,依此类推..

     cn1         cn2     cn3     amc_active
0     47         56       78         1
1     55         61       87         2
2     36         67       73         1
3     42         84       95         3

... ... ... ... ... ... ... ... ...
17410   42       84       96         3
17411   48       81       85         1
17412   55       59       82         1
17413   57       86       93         2
17414   36       87       91         2

为此,我使用 else if 条件

    if (df_col_all_merged['amc_active'] == 1):
        cn_for_s = df_col_all_merged['cn1']
    elif (df_col_all_merged['amc_active'] == 2):
        cn_for_s = df_col_all_merged['cn2']
    elif (df_col_all_merged['amc_active'] == 3):
        cn_for_s = df_col_all_merged['cn3']

但得到错误为

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-43-40c3b3817474> in <module>
----> 1 if (df_col_all_merged['amc_active'] == 1):
      2     cn_for_s = df_col_all_merged['cn1']
      3 elif (df_col_all_merged['amc_active'] == 2):
      4     cn_for_s = df_col_all_merged['cn2']
      5 elif (df_col_all_merged['amc_active'] == 3):

~\Anaconda3\envs\geocube\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1327 
   1328     def __nonzero__(self):
-> 1329         raise ValueError(
   1330             f"The truth value of a {type(self).__name__} is ambiguous. "
   1331             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

并由此

def select_cn(df_col_all_merged):
    result = cn_for_s
    if (df_col_all_merged['amc_active'] == 1):
        cn_for_s = df_col_all_merged['cn1']
    elif (df_col_all_merged['amc_active'] == 2):
        cn_for_s = df_col_all_merged['cn2']
    elif (df_col_all_merged['amc_active'] == 3):
        cn_for_s = df_col_all_merged['cn3']
    return result

df_col_all_merged['s_mm'] = (25400/select_cn(df_col_all_merged)) - 254

错误是

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-54-df43eddeac39> in <module>
----> 1 df_col_all_merged['s_mm'] = (num_const/select_cn(df_col_all_merged)) - dev_const

<ipython-input-51-7405a6dd24db> in select_cn(df_col_all_merged)
      1 def select_cn(df_col_all_merged):
----> 2     result = cn_for_s
      3     if (df_col_all_merged['amc_active'] == 1):
      4         cn_for_s = df_col_all_merged['cn1']
      5     elif (df_col_all_merged['amc_active'] == 2):

UnboundLocalError: local variable 'cn_for_s' referenced before assignment

如何纠正这种情况?

标签: pythonpython-3.xpandasdataframemultiple-columns

解决方案


您可以使用 numpy 的精美索引:

# get the values of `cn*` columns
cn_123 = df_col_all_merged[["cn1", "cn2", "cn3"]].to_numpy()

# index into it as "(row_index, amc_active_value-1)"
cn = cn_123[np.arange(len(df_col_all_merged)),
            df.amc_active-1]

# perform the formula
df_col_all_merged["s_mm"] = (25400/cn) - 254

我们用索引为每一行索引amc_active(但自 0 索引以来为负 1),

要得到

       cn1  cn2  cn3  amc_active        s_mm
0       47   56   78           1  286.425532
1       55   61   87           2  162.393443
2       36   67   73           1  451.555556
3       42   84   95           3   13.368421
17410   42   84   96           3   10.583333
17411   48   81   85           1  275.166667
17412   55   59   82           1  207.818182
17413   57   86   93           2   41.348837
17414   36   87   91           2   37.954023

或者,有np.select多个 if-elif 代替:

# form the conditions & corresponding choices
conditions = [df.amc_active.eq(1), df.amc_active.eq(2), df.amc_active.eq(3)]
choices = [df.cn1, df.cn2, df.cn3]

# select so
cn = np.select(conditions, choices)

# formula
df_col_all_merged["s_mm"] = (25400/cn) - 254

但由于结构方便,df.amc_active这可能不是最好的。(您也可以省略最后一个条件和选项,并将其作为默认值np.select,即作为“else”)。


推荐阅读