首页 > 解决方案 > 根据条件组合两个大小不等的数据帧

问题描述

我有两个数据框:df1 和 df2。我想获取 df2 中列的值并将其添加到 df1。

df1:

Title = ['Aeroplane', 'Ships', 'Houses']
Term = ['Computers', 'Flasks', 'Mouse']
counts_1 = [200, 30, 45, 66, 33, 450, 60, 100, 150]
df_1 = pd.DataFrame({"Title": Title, "Terms": Term})
product_terms = product(term_list, cap_list)
df_1 = pd.DataFrame(product_terms, columns=['Term', 'Title'])
df_1['C1'] = counts_1

       Term        Title       C1
    0  Computers   Aeroplane   200
    1  Computers   Ships        30
    2  Computers   Houses       45
    3  Flasks      Aeroplane    66
    4  Flasks      Ships        33
    5  Flasks      Houses      450
    6  Mouse      Aeroplane     60
    7  Mouse      Ships        100
    8  Mouse      Houses       150

df2(较小的)

terms = ['Computers', 'Flasks', 'Flasks', 'Mouse']
title = ['Aeroplane', 'Aeroplane', 'Ships', 'Houses']
count_2 = [3, 6, 13, 15]
df_2 = pd.DataFrame({'Term': terms, 'Title': title, 'C2': count_2})


       Term        Title       C2
    0  Computers   Aeroplane    3
    1  Flasks      Aeroplane    6
    2  Flasks      Ships       13
    3  Mouse      Houses       15

我想将两个 dfs 组合成一个 df,如下所示:将列 C2 从 df_2 添加到 df_1(基于 Term 和 Title col 匹配)并在没有相应匹配 Term 和 Title cols 的地方插入 0。

   Term        Title       C1      C2
0  Computers   Aeroplane   200      3
1  Computers   Ships        30      0
2  Computers   Houses       45      0
3  Flasks      Aeroplane    66      6
4  Flasks      Ships        33     13
5  Flasks      Houses      450      0
6  Mouse      Aeroplane     60      0
7  Mouse      Ships        100     15
8  Mouse      Houses       150      0

df2 的术语和标题始终是 df1 的子集。

这是我尝试过的:

df_1.set_index(['Term', 'Title'], inplace=True)
df_2.set_index(['Term', 'Title'], inplace=True)

然后,遍历行并分配值。

for idx, row in df_1.iterrows():
    try:
        c2_value = df_2.loc[idx, 'C2']
    except:
        df_1.loc[idx, 'C2'] = 0
    else:
        df_1.loc[idx]['C2'] = c2_value


df_final = df_1.reset_index()

有没有更好的方法来实现我想要的?我觉得iterrows可能不是一个有效的方法。我的数据框有数百万行。

标签: pandas

解决方案


推荐阅读