首页 > 解决方案 > Pandas 创建一个包含计算结果的新列,应用于多个列

问题描述

我需要一些帮助来修改我的功能以及如何应用它以通过多个功能迭代 ifelse 条件。

假设我们有下表t1

import pandas as pd
names = {'name': ['Jon','Bill','Maria','Emma']
         ,'feature1': [2,3,4,5]
         ,'feature2': [1,2,3,4]
         ,'feature3': [1,2,3,4]}
t1 = pd.DataFrame(names,columns=['name','feature1','feature2','feature3'])

我想根据 ifelse 条件创建 3 个新列。这是我为第一个功能做的事情:

# Define the conditions
def ifelsefunction(row):
    if row['feature1'] >=3:
        return 1
    elif row['feature1'] ==2:
        return 2
    else:
        return 0

# Apply the condition
t1['ft1'] = t1.apply(ifelsefunction, axis=1)

我想把函数写成这样的可迭代的东西

def ifelsefunction(row, feature):
    if row[feature] >=3:
        return 1
    elif row[feature] ==2:
        return 2
    else:
        return 0

t1['ft1_score'] = t1.apply(ifelsefunction(row, 'feature1'), axis=1)
t1['ft2_score'] = t1.apply(ifelsefunction(row, 'feature2'), axis=1)
t1['ft3_score'] = t1.apply(ifelsefunction(row, 'feature3'), axis=1)

- - 编辑 - -

感谢您的回答,我可能过度简化了实际问题。

在这种情况下,我该如何做同样的事情?

def ifelsefunction(var1, var2):
    mask1 = (var1 >=3) and (var1<var2)
    mask2 = var1 == 2
    return np.select([mask1,mask2], [var1*0.7, var1*var2], default=0)

标签: pythonpandas

解决方案


我认为这里最好避免循环,numpy.select仅用于测试并为列表中的选定列分配掩码,使用带有输入的传递DataFrame函数DataFrame.pipe

# Define the conditions
def ifelsefunction(df):
    m1 = df >= 3
    m2 = df == 2
    return np.select([m1, m2], [1, 2], default=0)


cols = ['feature1','feature2','feature3']
t1[cols] = t1[cols].pipe(ifelsefunction)
#alternative
#t1[cols] = ifelsefunction(t1[cols])

print (t1)
    name  feature1  feature2  feature3
0    Jon         2         0         0
1   Bill         1         2         2
2  Maria         1         1         1
3   Emma         1         1         1

对于新列,请使用:

# Define the conditions
def ifelsefunction(df):
    m1 = df >= 3
    m2 = df == 2
    return np.select([m1, m2], [1, 2], default=0)


cols = ['feature1','feature2','feature3']
new = [f'{x}_score' for x in cols]

t1[new] = t1[cols].pipe(ifelsefunction)
#alternative
#t1[new] = ifelsefunction(t1[cols])

print (t1)
    name  feature1  feature2  feature3  feature1_score  feature2_score  \
0    Jon         2         1         1               2               0   
1   Bill         3         2         2               1               2   
2  Maria         4         3         3               1               1   
3   Emma         5         4         4               1               1   

   feature3_score  
0               0  
1               2  
2               1  
3               1  

编辑:

您可以更改功能,例如:

def ifelsefunction(df, var1, var2):
    mask1 = (df[var1] >=3) & (df[var1]<df[var2])
    mask2 = df[var1] == 2
    return np.select([mask1,mask2], [df[var1]*0.7, df[var1]*df[var2]], default=0)


t1['new'] = ifelsefunction(t1, 'feature3','feature1')
print (t1)
    name  feature1  feature2  feature3  new
0    Jon         2         1         1  0.0
1   Bill         3         2         2  6.0
2  Maria         4         3         3  2.1
3   Emma         5         4         4  2.8

推荐阅读