首页 > 解决方案 > How can I "concat" rows by same value in a column in Pandas?

问题描述

I would like to concat rows value in one row in a dataframe, given one column. Then I would like to receive an edited dataframe.

Input Data :

ID  F_Name  L_Name  Address SSN     Phone
123 Sam     Doe     123     12345   111-111-1111
123 Sam     Doe     123     12345   222-222-2222
123 Sam     Doe     abc345  12345   111-111-1111
123 Sam     Doe     abc345  12345   222-222-2222
456 Naveen  Gupta   456     45678   333-333-3333
456 Manish  Gupta   456     45678   333-333-3333

Expected Output Data :

myschema = {
"ID":"123"
"F_Name":"Sam"
"L_Name":"Doe"
"Addess":"[123, abc345]"
"Phone":"[111-111-1111,222-222-2222]"
"SSN":"12345"
}
{
"ID":"456"
"F_Name":"[Naveen, Manish]"
"L_Name":"Gupta"
"Addess":"456"
"Phone":"[333-333-333]"
"SSN":"45678"

}

Code Tried :

df = pd.read_csv('data.csv')
print(df)

标签: pythonpandasdataframepandas-groupby

解决方案


try groupby()+agg():

myschema=(df.groupby('ID',as_index=False)
        .agg(lambda x:list(set(x))[0] if len(set(x))==1 else list(set(x))).to_dict('r'))

OR

If order is important then aggregrate pd.unique():

myschema=(df.groupby('ID',as_index=False)
    .agg(lambda x:pd.unique(x)[0] if len(pd.unique(x))==1 else pd.unique(x).tolist())
    .to_dict('r'))

so in the above code we are grouping the dataframe on 4 columns i.e ['ID','F_Name','L_Name','SSN'] then aggregrating the result and finding the unique values by aggregrating set and typecasting that set to a list and then converting the aggregrated result to list of dictionary and then selecting the value at 0th postion

output of myschema:

[{'ID': 123,
  'F_Name': 'Sam',
  'L_Name': 'Doe',
  'Address': ['abc345', '123'],
  'SSN': 12345,
  'Phone': ['222-222-2222', '111-111-1111']},
 {'ID': 456,
  'F_Name': ['Naveen', 'Manish'],
  'L_Name': 'Gupta',
  'Address': '456',
  'SSN': 45678,
  'Phone': '333-333-3333'}]

推荐阅读