首页 > 解决方案 > 使用 pandas 将父表与子表组合/合并

问题描述

我有一个父表df0,有两个级别的代码Code1Code2Message。多个子表df1存储较低级别Code2Messages. 如您所见,"df1"在表中表示为索引df0

    import pandas as pd

    s1 = pd.Series([1, 0, 'A'])
    s2 = pd.Series([2, 'df1', 'B'])
    s3 = pd.Series([3, 0, 'C'])
    df0 = pd.DataFrame([list(s1), list(s2), list(s3)],  columns =  ["Code1", "Code2", "Message"])

    df0   

         Code1  Code2   Message
    0     1     0       A
    1     2     df1     B
    2     3     0       C

    s4 = pd.Series([0, 'B1'])
    s5 = pd.Series([1, 'B2'])
    s6 = pd.Series([2, 'B3'])
    df1 = pd.DataFrame([list(s4), list(s5), list(s6)],  columns =  ["Code2", "Message"])


    df1
                Code2   Message
        0       0       B1
        1       1       B2
        2       2       B3

我想把一个更大的桌子df1放回原处df0并制作一个扁平的合并表。谢谢您的帮助。


        Code1   Code2   Message
0       1       0       A
1       2               B
2       2       0       B1
3       2       1       B2
4       2       2       B3
5       3       0       C

标签: pythonpandas

解决方案


使用merge. 下面提供了一个中间结果,可以进一步处理以获得所需的输出。

import pandas as pd
import numpy as np
print(pd.__version__)

s1 = pd.Series([1, 0, 'A'])
s2 = pd.Series([2, 'df1', 'B'])
s3 = pd.Series([3, 0, 'C'])
df0 = pd.DataFrame([list(s1), list(s2), list(s3)],  columns =  ["Code1", "Code2", "Message"])

s4 = pd.Series([0, 'B1'])
s5 = pd.Series([1, 'B2'])
s6 = pd.Series([2, 'B3'])
df1 = pd.DataFrame([list(s4), list(s5), list(s6)],  columns =  ["Code2", "Message"])
join_key = 'df1'
df1[join_key] = join_key

df2 = pd.concat([df0[df0['Code2'] == join_key], df1], sort=True)

result = df0.merge(df2, left_on=['Code2'], right_on=['df1'], how='outer', suffixes=('_l', '_r'))
print (result)

结果

0.24.2
   Code1_l Code2_l Message_l  Code1_r Code2_r Message_r  df1
0      1.0       0         A      NaN     NaN       NaN  NaN
1      3.0       0         C      NaN     NaN       NaN  NaN
2      2.0     df1         B      NaN       0        B1  df1
3      2.0     df1         B      NaN       1        B2  df1
4      2.0     df1         B      NaN       2        B3  df1
5      NaN     NaN       NaN      2.0     df1         B  NaN

推荐阅读