首页 > 解决方案 > Creating a new column with concatenated values from another column

问题描述

I am trying to create a new column in this data frame. The data set has multiple records for each PERSON because each record is a different account. The new column values should be a combination of the values for each PERSON in the TYPE column. For example, if John Doe has four accounts the value next to his nae in the new column should be a concatenation of the values in TYPE. An example of the final data frame is below. Thanks in advance.

enter image description here

标签: pythonconcatenation

解决方案


You can do this in two lines (first code, then explanation):

Code:

in: name_types = df.pivot_table(index='Name', values='AccountType', aggfunc=set)
out:
                AccountType
Name    
Jane Doe        {D}
John Doe        {L, W, D}
Larry Wild      {L, D}
Patti Shortcake {L, W}

in: df['ClientType'] = df['Name'].apply(lambda x: name_types.loc[x]['AccountType'])

Explanation:

  1. The pivot table gets all the AccountTypes for each individual name, and removes all duplicates using the 'set' aggregate function.

  2. The apply function then iterates through each 'Name' in the main data frame, looks up the AccountType associated with each in name_typed, and adds it to the new column ClientType in the main dataframe.

And you're done!

Addendum: If you need the column to be a string instead of a set, use:

in: def to_string(the_set):
    string = ''
    for item in the_set:
        string += item
    return string

in: df['ClientType'] = df['ClientType'].apply(to_string)

in: df.head()
out: 

Name    AccountType ClientType
0   Jane Doe    D   D
1   John Doe    D   LDW
2   John Doe    D   LDW
3   John Doe    L   LDW
4   John Doe    D   LDW

推荐阅读