首页 > 解决方案 > How to create column that contains the column name from the row max of a list of columns

问题描述

I have a df that looks like this:

   time                 A                B                 C 
    0                   0                19                19    
    1                   0                 4                 4     
    2                   0                 0                 0     
    3                   0                 0                 0     
    4                   0                 4                 4  

I want to create a new column that yields the column name for highest value per row between columns A, B, and C. If all values are 0, it should yield NaN. If there is a tie, then it should yield both values. There is a helpful answer here that I am starting with, but this function yields the first column name when all columns are 0 and doesn't handle ties.

name of column, that contains the max value

What I want is this:

   time                 A                B              C          MAX
    0                   0                18             19       C
    1                   0                 4              4    [B,C]
    2                   0                 0              0      NaN 
    3                   0                 0              0      NaN
    4                  10                 4              4        A

标签: pythonpandas

解决方案


您可以使用应用:

def ma(xs):
    lst = [name for name, x in zip(xs.index, xs) if x == max(xs) and x > 0]

    if len(lst) == 1:
        return lst[0]

    return lst or np.nan


df['max'] = df[['A', 'B', 'C']].apply(ma, axis=1)

print(df)

输出

   time   A   B   C     max
0     0   0  18  19       C
1     1   0   4   4  [B, C]
2     2   0   0   0     NaN
3     3   0   0   0     NaN
4     4  10   4   4       A

推荐阅读