python - Pandas:根据其他列的值创建一个新列(按行)
问题描述
我正在寻找基于几列的自定义函数(
`TOTAL_HH_INCOME','HH_SIZE'
'Eligible Household Size', 'income_min1', 'income_max1', 'hh_size2','income_min2', 'income_max2', 'hh_size3', 'income_min3', 'income_max3', 'hh_size4', 'income_min4', 'income_max4', 'hh_size5', 'income_min5', 'income_max5', 'hh_size6', 'income_min6', 'income_max6'`
我正在寻找比较 HH Size 与每个 HH size# 变量和 TOTAL_HH_INCOME 与我数据框中每一行的每个 income_min 和income_max 变量。
我做了这个功能作为尝试
def eligibility (row):
if df['HH_SIZE']== df['Eligible Household Size'] & df['TOTAL_HH_INCOME'] >= df['income_min1'] & df['TOTAL_HH_INCOME'] <=row['income_max1'] :
return 'Eligible'
if df['HH_SIZE']== df['hh_size2'] & df['TOTAL_HH_INCOME'] >= df['income_min2'] & df['TOTAL_HH_INCOME'] <=row['income_max2'] :
return 'Eligible'
if df['HH_SIZE']== df['hh_size3'] & df['TOTAL_HH_INCOME'] >= df['income_min3'] & df['TOTAL_HH_INCOME'] <=row['income_max3'] :
return 'Eligible'
if df['HH_SIZE']== df['hh_size4'] & df['TOTAL_HH_INCOME'] >= df['income_min4'] & df['TOTAL_HH_INCOME'] <=row['income_max4'] :
return 'Eligible'
if df['HH_SIZE']== df['hh_size5'] & df['TOTAL_HH_INCOME'] >= df['income_min5'] & df['TOTAL_HH_INCOME'] <=row['income_max5'] :
return 'Eligible'
if df['HH_SIZE']== df['hh_size6'] & df['TOTAL_HH_INCOME'] >= df['income_min6'] & df['TOTAL_HH_INCOME'] <=row['income_max6'] :
return 'Eligible'
return 'Ineligible'
如您所见,如果该行符合条件,我希望该行被标记为“合格”,否则应标记为“不合格”
我将此功能应用于我的df
df['Eligibility']= df.apply(eligibility, axis=1)
但是,我收到一个错误:
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')
为什么?我的功能不正常吗?
编辑:
====================== DATAFRAME ===========================
解决方案
问题似乎出在 if 语句中的比较运算符:因为您正在比较数据框的列,所以不仅有一个 True 值,而且还有与列中的项目一样多的 True 值。
如果您希望所有元素都相同,请尝试使用 a.all()。请参考以下示例:
import pandas as pd
dict1 = {'name1': ['tom', 'pedro'], 'name2': ['tom', 'pedro'],
'name3': ['tome', 'maria'], 'name4': ['maria', 'marta']}
df1 = pd.DataFrame(dict1)
# This produce a ValueError as the one you have
# if df1['name1'] == df1['name2']:
# pass
# To see why this produce an error try printing the following:
print('This is a DataFrame of bool values an can not be handle by an if statement: \n',
df1['name1'] == df1['name2'])
# This check if all the elements in 'name1' are the same as in 'name2'
if (df1['name1'] == df1['name2']).all():
print('\nEligible')
输出:
This is a DataFrame of bool values an can not be handle by an if statement:
0 True
1 True
dtype: bool
Eligible
推荐阅读
- linux - linux上的Dart / Flutter ffi - 配置CMake的问题
- neural-network - 冻结 TensorFlow2 层
- c++ - '->' 运算符是如何工作的,修改大字符串是否是一个好的实现?
- javascript - 向人员列表发送私人消息
- node.js - Gulp 命令运行永远不会完成
- fdr - FDR4.2.7 无法连接许可证服务器
- python - 自定义损失函数失败,内部有标准损失
- c - ` 中的错误
': 损坏的双链表: 0x011eb8e0 在调用 free 时偶尔发生 - python - 如何在 python VS CODE 中以表格格式从 SQLite3 打印数据
- javascript - 如何替换名称属性