首页 > 解决方案 > Pandas:通过迭代现有列中的值来填充数据框中的特定列以计算重复值的数量

问题描述

我有一个样本数据,其中包含一些虚假的银行账户信息,其中包含一个名为“账户指示器”的列,我需要根据重复帐号的数量来填充该信息。

数据框如下所示:

  Customer code   Account name  Account number  Account balance  Account indicator
0         20041      Tom Brown       24911467        30155.00                  
1         20041      Tom Brown       25193860           45.00                  
2         20021     Kate Perry       24092030        20450.00                  
3         20021     Kate Perry       24092007           50.00                  
4         20153    Harry Smith       20641387         5000.00                  
5         20154    Bella Smith       20641387         5000.00                  
6         20149   John William       20753196         7000.00                  
7         20144  Emily William       20753196         5000.00                  
8         20146  Rosie William       20753196         2000.00   
  1. Account 指示符中的值必须以“00”和一个数字开头。它的工作方式是例如(请参阅下表),帐号“20641387”重复了两次,因为 Harry Smith 和 Bella Smith 拥有相同的帐号。因此,该索引的 Account 指标是“ 002 ”。这称为联名账户。

我期望在 Account 指标中看到的输出是:

  Customer code   Account name  Account number  Account balance  Account indicator
0         20041      Tom Brown       24911467       30155.00      001             
1         20041      Tom Brown       25193860          45.00      001             
2         20021     Kate Perry       24092030       20450.00      001             
3         20021     Kate Perry       24092007          50.00      001             
4         20153    Harry Smith       20641387        5000.00      002             
5         20154    Bella Smith       20641387        5000.00      002             
6         20149   John William       20753196        7000.00      003             
7         20144  Emily William       20753196        5000.00      003             
8         20146  Rosie William       20753196        2000.00      003

此外,对于指标中显示的任何联名账户(例如 002、003),我还需要检查其帐号的账户余额是否相等。从上面我们可以看到,数字“20641387”对于 Harry 和 Bella 的余额都是 5000.00。这是对的。但是,“20753196”的余额不正确,因为它们不相等。

我还需要有关检查是否有多个帐户的建议和帮助。这可以通过检查“帐户名称”列是否与两个不同的帐号具有相同的名称来识别。例如,汤姆布朗和凯特佩里。

以下是我尝试过的,任何建议将不胜感激!

import pandas as pd
import numpy as np

data = {
        'Customer code': ['20041', '20041', '20021', '20021', '20153', '20154', '20149', '20144', '20146'],
        'Account name': ['Tom Brown', 'Tom Brown', 'Kate Perry', 'Kate Perry', 'Harry Smith', 'Bella Smith', 'John William', 'Emily William', 'Rosie William'],
        'Account number': ['24911467', '25193860', '24092030', '24092007', '20641387', '20641387', '20753196', '20753196', '20753196'],
        'Account balance': ['30155.00', '45.00', '20450.00', '50.00', '5000.00', '5000.00', '7000.00', '5000.00', '2000.00'],
        'Account indicator': ''}


account_details = pd.DataFrame(data)


dups_account_number = account_details.pivot_table(columns=['Account number'], aggfunc='size')

#Populate Account_Indicator with the right code
#each value in the Account indicator column = '00' + the count of its duplicated Account number
for i in account_details['Account indicator']:
    for j in account_details['Account number']:
        if account_details['Account number'].duplicated().any() == True:
            account_details['Account indicator'] = account_details['Account indicator'].apply(lambda x: '00' + dups_account_number)
            print('There are joint accounts')

        else:
            account_details['Account indicator'] = '001' #001 means 1 account number

#Here --> Code to check if the Account balances for a joint account are equal
#Could use --> if account_details['Account indicator'] != 001 to start with? Because 001 indicators are definitely not the joint account
    
    
#Here --> Code to check for multiple accounts if one name has 2 or more different Account_numbers




print(account_details)

请分享你的想法!谢谢!

标签: pythonpandasdataframe

解决方案


以下是如何使用groupby/ transform+映射“帐户指标” nunique

请使用天平问题另一部分的示例更新您的输出:

account_details['Account indicator'] = (account_details.groupby('Account number')
                                                       ['Account name']
                                                       .transform('nunique')
                                                       .map('{:03d}'.format)
                                       )

输出:

  Customer code   Account name Account number Account balance Account indicator
0         20041      Tom Brown       24911467        30155.00               001
1         20041      Tom Brown       25193860           45.00               001
2         20021     Kate Perry       24092030        20450.00               001
3         20021     Kate Perry       24092007           50.00               001
4         20153    Harry Smith       20641387         5000.00               002
5         20154    Bella Smith       20641387         5000.00               002
6         20149   John William       20753196         7000.00               003
7         20144  Emily William       20753196         5000.00               003
8         20146  Rosie William       20753196         2000.00               003

平衡:

account_details['Balance indicator'] = (account_details.groupby('Account number')
                                                       ['Account balance']
                                                       .transform('nunique')
                                                       .eq(1)
                                                       .map({True: 'valid balance', False: 'invalid balance'})
                                       )

推荐阅读