首页 > 解决方案 > 比较来自不同数据帧的两个值并基于熊猫中的附加值

问题描述

需要比较两个不同的 Dataframe 并根据结果将值添加到列

country = {'Year':[2020,2021],'Host':['Mexico','Panama'],'Winners':['Canada','Japan']}

country_df = pd.DataFrame(country,columns=['Year','Host','Winners'])
    Year  Host      Winners
0   2020  Mexico    Canada
1   2021  Panama    Japan
all_country = {'Country': ['USA','Mexico','USA','Panama','Japan'],'Year':[2021,2020,2020,2021,2021]}
all_country_df=pd.DataFrame(all_country,columns=['Country','Year']
Country     Year
0   USA     2021
1   Mexico  2020
2   USA     2020
3   Panama  2021
4   Japan   2021

我想将 all_country_df 与 country_df 进行比较,以找出给定年份的东道国以及获胜者,例如

all_country= {'Country':['USA','Mexico','USA','Panama','Japan'],'Year':[2021,2020,2020,2021,2021],'Winner':[None,None,None,None,'Winner'],'Host':[None,'Host',None,'Host',None]}
all_Country_df=pd.DataFrame(all_country,columns=['Country','Year','Winner','Host'])

像这样


    Country Year    Winner  Host
0   USA     2021    None    None
1   Mexico  2020    None    Host
2   USA     2020    None    None
3   Panama  2021    None    Host
4   Japan   2021    Winner  None

标签: pythonpandasdataframe

解决方案


尝试使用mergenp.where

newdf = all_country_df.merge(country_df)
newdf['Winners'] = np.where(newdf['Country'].ne(newdf['Winner']), np.nan, 'Winners')
newdf['Host'] = np.where(newdf['Country'].ne(newdf['Host']), np.nan, 'Host')
print(newdf)

输出:

  Country  Year  Host  Winners
0     USA  2021   nan      nan
1  Panama  2021  Host      nan
2   Japan  2021   nan   Winner
3  Mexico  2020  Host      nan
4     USA  2020   nan      nan

推荐阅读