首页 > 解决方案 > Outer join two dataframes based on index without disrupting the original order of both dataframes

问题描述

I intend to create a table that compares the contents of two dataframes. The indices are part of a process and the order has been predefined. My dataframes look like as below:

df1:

Index Content
xxa 0
init 1
tyui 2
ddas 3
gsdj 4
hasf 5

df2:

Index Content
xxa 1
init 1
fafa 6
eafa 7
gsdj 6
hasf 5

Intended merged output:

Index1 Index2 Content1 Content2
xxa xxa 0 1
init init 1 1
tyui 2
ddas 3
fafa 6
eafa 7
gsdj gsdj 4 6
hasf hasf 5 5

I have tried taking the union of the two indices, i.e.

index_all = df1.index.union(df2.index, sort=False)

However, the indices are sorted as | Index1| Index2 | Content1 | Content2 | ------| ------- | -------- | -------- | xxa | xxa | 0 | 1 | init | init | 1 | 1 | tyui | |2 | | ddas | |3 | | gsdj | gsdj |4 | 6 | hasf | hasf |5 | 5 | | fafa | | 6 | | eafa | | 7

标签: pythonpandasdataframe

解决方案


像这样的东西怎么样:

df = pd.DataFrame(index = pd.concat([df1, df2])["Index"].unique(), columns = ["Index1", "Index2"])

df["Index1"] = df1.set_index("Index", drop = False)
df["Index2"] = df2.set_index("Index", drop = False)

df.sort_index().reset_index(drop = True)
#  Index1 Index2
#0      a      a
#1      b      b
#2      c    NaN
#3      d    NaN
#4    NaN      e
#5    NaN      f
#6      g      g
#7      h      h

推荐阅读