首页 > 解决方案 > Pandas increase efficiency in merging dataframes

问题描述

Here is the problem I am having.

I want to map team codes (contained in main df) with team names contained in dfTeam.

Here is a sample of main df:

              Game Code  Play Number  Period Number  Clock  Offense Team Code
0       690002820050901            1              1  900.0                690   
1       690002820050901            2              1    NaN                 28   
2       690002820050901            3              1    NaN                 28   
3       690002820050901            4              1    NaN                 28   
4       690002820050901            5              1    NaN                 28   
5       690002820050901            6              1    NaN                 28   
6       690002820050901            7              1  826.0                690   
7       690002820050901            8              1    NaN                690   
8       690002820050901            9              1    NaN                690 

I want to change the columns like Offense Team Code with the team names using this dataframe, named dfTeam.

     Team Code                      Name  Conference Code
0            5                     Akron              875
1            8                   Alabama              911
2            9                       UAB            24312
3           28             Arizona State              905
4           29                   Arizona              905

Here is what I currently do, to merge the dataframes, change the column name and delete the columns I don't need:

teamDict = {'Home Team Code':'homeTeamName','Visit Team Code':'visitTeamName','Offense Team Code':'offenseTeamName','Defense Team Code':'defenseTeamName'}
for oldName,newName in teamDict.items():
    dfFULL = pd.merge(dfFULL,dfTeam,how='left',left_on=oldName,right_on='Team Code')
    dfFULL.rename(columns={'Name':newName},inplace=True)
    dfFULL.drop(['Conference Code','Team Code',oldName],axis=1,inplace=True)

This code works, but it is quite slow. My main dataframe has 130k or so rows. Is there a more efficient way to do it ?

标签: pythonpandas

解决方案


You basically want to use dfTeam as a dictionary to map Team Codes to Names. For each column, you can just perform the mapping.

d = dfTeam.set_index('Team Code').Name
for oldName,newName in teamDict.items():
    dfFULL[newName] = dfFULL[oldName].map(d)

# Then get rid of all old names
dfFull = dfFull.drop(columns=list(teamDict.keys()))

推荐阅读