首页 > 解决方案 > 合并多个 CSV 文件并按字段删除重复项

问题描述

我需要匹配来自多个 CSV 文件的数据。例如,如果我有三个 CSV 文件。

输入 1 个 csv

PANYNJ LGA WEST 1,available, LGA West GarageFlushing
PANYNJ LGA WEST 4,unavailable,LGA West Garage
iPark - Tesla,unavailable,530 E 80th St

输入 2 csv

PANYNJ LGA WEST 4,unavailable,LGA West Garage
PANYNJ LGA WEST 5,available,LGA West Garage

输入 3 个 csv

PANYNJ LGA WEST 5,available,LGA West Garage
imPark - Tesla,unavailable,611 E 83rd St

第一列是name,第二列是,status最后一列是address。如果这三个文档具有相同的名称,我想将它们合并到一个 csv 文件中。我的愿望输出文件就像

输出 csv

PANYNJ LGA WEST 1,available, LGA West GarageFlushing
PANYNJ LGA WEST 4,unavailable,LGA West Garage
iPark - Tesla,unavailable,530 E 80th St
PANYNJ LGA WEST 5,available,LGA West Garage
imPark - Tesla,unavailable,611 E 83rd St

我正在尝试使用pandasor解决此问题,CSV但我不确定如何解决此问题。

任何帮助是极大的赞赏!

标签: pythonpython-3.xpandascsv

解决方案


使用pandas,您可以使用pd.concat后跟pd.drop_duplicates

import pandas as pd
from io import StringIO

str1 = StringIO("""PANYNJ LGA WEST 1,available, LGA West GarageFlushing
PANYNJ LGA WEST 4,unavailable,LGA West Garage
iPark - Tesla,unavailable,530 E 80th St""")

str2 = StringIO("""PANYNJ LGA WEST 4,unavailable,LGA West Garage
PANYNJ LGA WEST 5,available,LGA West Garage""")

str3 = StringIO("""PANYNJ LGA WEST 5,available,LGA West Garage
imPark - Tesla,unavailable,611 E 83rd St""")

# replace str1, str2, str3 with 'file1.csv', 'file2.csv', 'file3.csv'
df1 = pd.read_csv(str1, header=None)
df2 = pd.read_csv(str2, header=None)
df3 = pd.read_csv(str3, header=None)

res = pd.concat([df1, df2, df3], ignore_index=True)\
        .drop_duplicates(0)

print(res)

                   0            1                         2
0  PANYNJ LGA WEST 1    available   LGA West GarageFlushing
1  PANYNJ LGA WEST 4  unavailable           LGA West Garage
2      iPark - Tesla  unavailable             530 E 80th St
4  PANYNJ LGA WEST 5    available           LGA West Garage
6     imPark - Tesla  unavailable             611 E 83rd St

推荐阅读