首页 > 解决方案 > 在python 3中比较2个不同csv文件的不同行并创建新的csv

问题描述

这是场景。我有2个csv文件如下:

CSV 文件 1(previousmembers.csv):

john.doe@mydomain.com
suzy.smith@mydomain.com
test.person@mydomain.com
another.person@mydomain.com
cool.guy@mydomain.com

CSV 文件 2(更新成员.csv):

1234,password1,John,Mike,Doe,2022,john.doe@mydomain.com
83762,password2,Suzy,Sally,Smith,2022,suzy.smith@mydomain.com
91209,password3,Test,Kid,Person,2023,test.person@mydomain.com
671653,password4,Cool,Tom,Guy,2027,cool.guy@mydomain.com
82637,password5,New,Billy,Kid,2026,new.kid@mydomain.com
956656,password6,Another,New,Newbie,2027,another.newbie@mydomain.com

所需的输出(newfolks.csv):

82637,password5,New,Billy,Kid,2026,new.kid@mydomain.com
956656,password6,Another,New,Newbie,2027,another.newbie@mydomain.com

这是我到目前为止所拥有的,甚至还没有接近工作:

with open('previousmembers.csv') as check_file:
    check_set = set([row[0] for row in check_file])


with open('updatedmembers.csv', 'r') as in_file, open('newfolks.csv', 'w') as out_file:
    check_set2 = set([row[6] for row in in_file])
    for line in check_set2:
        if line not in check_set:
            out_file.write(line)

这个想法是我想创建一个 csv 文件,其中包含 updatedmembers.csv 中的每一行,其中 updatedmembers.csv 的行 [6] 在 previousmembers.csv 中不存在。(previusmembers.csv 只会列出一封电子邮件,这就是为什么我需要比较 updatedmembers.csv 的第 [6] 行

任何帮助是极大的赞赏!

标签: pythonpython-3.xdataframe

解决方案


主要问题是您没有将逗号分隔值处理为列表。您通常会为此使用csv 模块,它可以很好地处理边缘情况并使一些事情变得更简单。但是,如果您只是在学习,则可以使用split(',')拆分值。完成此操作后,您可以索引并获取单词。例如:

with open('previousmembers.csv') as check_file:
    # no need to index here, it's just one string per line
    # strip whitespace to be sure there's no junk
    check_set = set(row.strip() for row in check_file)


with open('updatedmembers.csv', 'r') as in_file, open('newfolks.csv', 'w') as out_file:
    for line in in_file:
        # split on commas (or use csv module)
        fields = line.split(',')
        if fields[6].strip() not in check_set:
            out_file.write(line)

这会将这些行写入新文件:

82637,password5,New,Billy,Kid,2026,new.kid@mydomain.com
956656,password6,Another,New,Newbie,2027,another.newbie@mydomain.com

推荐阅读