首页 > 解决方案 > 通过创建新列将收入划分为不同的收入组——Python

问题描述

我正在尝试根据已经存在的收入组创建收入桶/组。我想为我的数据框创建一个新列来做到这一点。

问题是现有的收入群体无法匹配,因为存在不同的范围和货币。

最初我想使用正则表达式对其进行排序,但我放弃了(不知道该怎么做,即使有可能)

我采取了以下措施:

def Income_Groups(AnnualIncome): 

  Income = {
      'Under £5,000':'<25k','less than £25,000':'<25k','less than €25,000':'<25k','Between_0_5':'<25k','Between_0_25':'<25k','Between_5_15':'<25k','Between_15_30':'<25k', 
      '£25,001-£50,000':'25-50k','£30,000-£50,000':'25-50k','€25,001-€50,000':'25-50k','Between_25_50':'25-50k','Between_30_50':'25-50k', 
      '£50,001-£100,000':'50-100k','€50,001-€100,000':'50-100k','Between_50_75':'50-100k','Between_75_100':'50-100k','Between_50_100':'50-100k', 
      '£100,000+':'>100k','€100,000+':'>100k','Above_100':'>100k' 
  }
  
  try:
      return Income[AnnualIncome]
  except:
      return AnnualIncome

data_m['IncomeGroups'] = data_m.AnnualIncome.apply(Income_Groups)

这段代码有效,但它没有让我选择我想对丢失的数据做什么,它会自动用“0”替换丢失的单元格,这不是我想要的。我宁愿看到“Na”或将单元格视为空单元格。

然后我尝试了另一个代码(更容易阅读):

def Income_Groups(AnnualIncome): 
    if AnnualIncome in 'Under £5,000'|'less than £25,000'|'less than €25,000'|'Between_0_5'|'Between_0_25'|'Between_5_15'|'Between_15_30': return '<25k' 
    elif AnnualIncome in '£25,001-£50,000'|'£30,000-£50,000'|'€25,001-€50,000'|'Between_25_50'|'Between_30_50': return '25-50k'
    elif AnnualIncome in '£50,001-£100,000'|'€50,001-€100,000'|'Between_50_75'|'Between_75_100'|'Between_50_100': return '50-100k' 
    elif AnnualIncome in '£100,000+'|'€100,000+'|'Above_100': return '>100k' 
    else: return ''

data_m['IncomeGroups'] = data_m.AnnualIncome.apply(Income_Groups)

(我没有尝试过每个条件都做一个“if/elif”和“return”,因为有很多。)

但是,对于第二个代码,我收到以下错误:

8 else 中的 TypeError Traceback(最近一次通话最后一次):return '' 9 ---> 10 data_m['IncomeGroups'] = data_m.AnnualIncome.apply(Income_Groups)

~\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds) 3846 else: 3847 values = self.astype(object).values -> 3848 mapped = lib.map_infer(values, f, convert=convert_dtype) 3849 3850 if len(mapped) and isinstance(mapped[0], Series):

pandas_libs\lib.pyx 在 pandas._libs.lib.map_infer()

在 Income_Groups(AnnualIncome) 2 3 def Income_Groups(AnnualIncome): ----> 4 如果年收入在 '5,000 英镑以下'|'低于 25,000 英镑'|'低于 25,000 欧元'|'Between_0_5'|'Between_0_25'| 'Between_5_15'|'Between_15_30': return '<25k' 5 elif 年收入在'£25,001-£50,000'|'£30,000-£50,000'|'€25,001-€50,000'|'Between_25_50'|'Between_30_50': 返回'25-50k' 6 elif 年收入在'£50,001-£100,000'|'€50,001-€100,000'|'Between_50_75'|'Between_75_100'|'Between_50_100':返回'50-100k'

类型错误:| 不支持的操作数类型:“str”和“str”

非常感谢您的帮助!!

标签: pythonpandasstructurerename

解决方案


您正在尝试对字符串进行按位运算。

替换所有你|,并添加括号以执行该示例中的字符串列表:

AnnualIncome = "Under £5,000"

if AnnualIncome in ['Under £5,000','less than £25,000','less than €25,000','Between_0_5','Between_0_25','Between_5_15','Between_15_30']:
    print("ok")

输出 :

ok

推荐阅读