首页 > 解决方案 > 在 Pandas 中整合数据

问题描述

我有两个数据集,我正在 Pandas 中进行左外部合并。这是第一个:

                Name          Address
0  Joe Schmuckatelli  123 Main Street
1    Fred Putzarelli  456 Pine Street
2          Harry Cox  789 Vine Street

第二个:

           Address InvoiceNum
0  123 Main Street      51450
1  456 Pine Street      51389
2  789 Vine Street      58343
3  123 Main Street      52216
4  456 Pine Street  53124-001
5  789 Vine Street      61215
6  789 Vine Street  51215-001

合并后的数据如下所示:

                Name          Address InvoiceNum
0  Joe Schmuckatelli  123 Main Street      51450
1  Joe Schmuckatelli  123 Main Street      52216
2    Fred Putzarelli  456 Pine Street      51389
3    Fred Putzarelli  456 Pine Street  53124-001
4          Harry Cox  789 Vine Street      58343
5          Harry Cox  789 Vine Street      61215
6          Harry Cox  789 Vine Street  51215-001                

理想情况下,我希望每个地址有一行,第三列中包含该地址的所有发票编号,如下所示:

                Name          Address InvoiceNum
0  Joe Schmuckatelli  123 Main Street      51450, 52216
1    Fred Putzarelli  456 Pine Street      51389, 53124-001
2          Harry Cox  789 Vine Street      58343, 61215, 51215-001

我用来合并数据的代码如下所示:

mergedData = pd.merge(complaintData, invoiceData, on='Address', how='left')

有没有办法在 Pandas 或其他方式中轻松做到这一点?

标签: pythonpandas

解决方案


我们可以通过在/之前将每个地址的字符串连接在一起来获取groupby aggregate值:df2joinmergedf1

new_df = df1.join(
    df2.groupby('Address')['InvoiceNum'].aggregate(', '.join),
    on='Address',
    how='left'
)

new_df

                Name          Address               InvoiceNum
0  Joe Schmuckatelli  123 Main Street             51450, 52216
1    Fred Putzarelli  456 Pine Street         51389, 53124-001
2          Harry Cox  789 Vine Street  58343, 61215, 51215-001

*要么joinmerge这里工作,要么在这里工作,尽管在这种情况下,由于has作为索引join的结果,开销略小。groupbyAddress


设置:

import pandas as pd

df1 = pd.DataFrame({
    'Name': ['Joe Schmuckatelli', 'Fred Putzarelli', 'Harry Cox'],
    'Address': ['123 Main Street', '456 Pine Street', '789 Vine Street']
})

df2 = pd.DataFrame({
    'Address': ['123 Main Street', '456 Pine Street', '789 Vine Street',
                '123 Main Street', '456 Pine Street', '789 Vine Street',
                '789 Vine Street'],
    'InvoiceNum': ['51450', '51389', '58343', '52216', '53124-001', '61215',
                   '51215-001']
})

推荐阅读