首页 > 解决方案 > pandas update a column with another dataframe columns's cell value with duplicated index

问题描述

there are 2 dfs

df = pd.DataFrame({'A': ['a', 'b', 'a','d','e']},index=[1,2,3,4,5])

ndf = pd.DataFrame({'A': ['a', '2', '6','e'],
                   'B': ['apple', 'pen', 'sky','duck']},index=[7,8,9,19])

df's A column should be updated with ndf's B columns values like this: if a cell of df'A columns's values==ndf's A columns values,the cell value of df's A column does no update; otherwise: df's A column's cell value should be replaced by the ndf's B columns cells values:

e.g.: after update, A should be like:

pd.DataFrame({'A': ['apple', 'b', 'apple','d','duck']},index=[1,2,3,4,5])

标签: pythonpandasdataframe

解决方案


You can create a dictionary with to_dict and then use replace:

df.replace(ndf.set_index('A').to_dict()['B'])

Output:

       A
1  apple
2      b
3  apple
4      d
5   duck

Details:

print(ndf.set_index('A'))
A       
a  apple
2    pen
6    sky
e   duck


print(ndf.set_index('A').to_dict())
{'B': {'a': 'apple', '2': 'pen', '6': 'sky', 'e': 'duck'}}


print(ndf.set_index('A').to_dict()['B'])
{'a': 'apple', '2': 'pen', '6': 'sky', 'e': 'duck'}

print(df.replace(ndf.set_index('A').to_dict()['B']))
       A
1  apple
2      b
3  apple
4      d
5   duck

推荐阅读