python - Pandas 创建一个包含计算结果的新列,应用于多个列
问题描述
我需要一些帮助来修改我的功能以及如何应用它以通过多个功能迭代 ifelse 条件。
假设我们有下表t1
import pandas as pd
names = {'name': ['Jon','Bill','Maria','Emma']
,'feature1': [2,3,4,5]
,'feature2': [1,2,3,4]
,'feature3': [1,2,3,4]}
t1 = pd.DataFrame(names,columns=['name','feature1','feature2','feature3'])
我想根据 ifelse 条件创建 3 个新列。这是我为第一个功能做的事情:
# Define the conditions
def ifelsefunction(row):
if row['feature1'] >=3:
return 1
elif row['feature1'] ==2:
return 2
else:
return 0
# Apply the condition
t1['ft1'] = t1.apply(ifelsefunction, axis=1)
我想把函数写成这样的可迭代的东西
def ifelsefunction(row, feature):
if row[feature] >=3:
return 1
elif row[feature] ==2:
return 2
else:
return 0
t1['ft1_score'] = t1.apply(ifelsefunction(row, 'feature1'), axis=1)
t1['ft2_score'] = t1.apply(ifelsefunction(row, 'feature2'), axis=1)
t1['ft3_score'] = t1.apply(ifelsefunction(row, 'feature3'), axis=1)
- - 编辑 - -
感谢您的回答,我可能过度简化了实际问题。
在这种情况下,我该如何做同样的事情?
def ifelsefunction(var1, var2):
mask1 = (var1 >=3) and (var1<var2)
mask2 = var1 == 2
return np.select([mask1,mask2], [var1*0.7, var1*var2], default=0)
解决方案
我认为这里最好避免循环,numpy.select
仅用于测试并为列表中的选定列分配掩码,使用带有输入的传递DataFrame
函数DataFrame.pipe
:
# Define the conditions
def ifelsefunction(df):
m1 = df >= 3
m2 = df == 2
return np.select([m1, m2], [1, 2], default=0)
cols = ['feature1','feature2','feature3']
t1[cols] = t1[cols].pipe(ifelsefunction)
#alternative
#t1[cols] = ifelsefunction(t1[cols])
print (t1)
name feature1 feature2 feature3
0 Jon 2 0 0
1 Bill 1 2 2
2 Maria 1 1 1
3 Emma 1 1 1
对于新列,请使用:
# Define the conditions
def ifelsefunction(df):
m1 = df >= 3
m2 = df == 2
return np.select([m1, m2], [1, 2], default=0)
cols = ['feature1','feature2','feature3']
new = [f'{x}_score' for x in cols]
t1[new] = t1[cols].pipe(ifelsefunction)
#alternative
#t1[new] = ifelsefunction(t1[cols])
print (t1)
name feature1 feature2 feature3 feature1_score feature2_score \
0 Jon 2 1 1 2 0
1 Bill 3 2 2 1 2
2 Maria 4 3 3 1 1
3 Emma 5 4 4 1 1
feature3_score
0 0
1 2
2 1
3 1
编辑:
您可以更改功能,例如:
def ifelsefunction(df, var1, var2):
mask1 = (df[var1] >=3) & (df[var1]<df[var2])
mask2 = df[var1] == 2
return np.select([mask1,mask2], [df[var1]*0.7, df[var1]*df[var2]], default=0)
t1['new'] = ifelsefunction(t1, 'feature3','feature1')
print (t1)
name feature1 feature2 feature3 new
0 Jon 2 1 1 0.0
1 Bill 3 2 2 6.0
2 Maria 4 3 3 2.1
3 Emma 5 4 4 2.8
推荐阅读
- azure - 获取分配给 Azure B2C 应用程序的平台
- syntax - 带有模块和 If 语句的 Mathematica 代码
- python-3.x - 此代码未运行 else 语句
- typescript - d3和弦图不会为非常小的值画线
- eclipse-cdt - 如何在 Eclipce IDE 中刷写 ESP32 设备?
- laravel - 磁盘空间已满时,AWS Ec2 laravel 拒绝连接。清空磁盘空间后仍然拒绝连接
- binning - 证据分箱的单调权重
- python - 在 Python 中连接到 rethinkdb 的正确方法
- c# - 为什么延迟加载 C# 中的数据会在我需要之前加载?
- button - 如何将单选选项链接到不同的按钮链接?