首页 > 解决方案 > 比较 2 个不同的 csv 文件并将所有更改输出到新的 csv

问题描述

我有 2 个 CSV,它们是 New.csv 和 Old.csv,如下所示:

旧的.csv

longName,shortName,eventType,number,severity
ACTAGENT201,ACAT201,RES,1,INFO
ACTAGENT202,ACAT202,RES,2,ALERT
ACODE801,AC801,ADMIN,1,MINOR
ACODE802,AC802,ADMIN,2,MINOR
ACODE102,AC102,COMM,2,CRITICAL
ACODE103,AC103,COMM,3,CRITICAL
ACODE104,AC104,COMM,4,CRITICAL
ACODE105,AC105,COMM,5,CRITICAL
ACODE106,AC106,COMM,6,CRITICAL

新建.csv

longName,shortName,eventType,number,severity
ACTAGENT201,ACAT201,RES,1,INFO
ACTAGENT202,ACAT202,RES,2,ALERT
ACODE801,AC801,ADMIN,1,MINOR
ACODE802,AC802,ThisHasBeenChanged,2,MINOR
ACODE102,AC102,COMM,2,CRITICAL
ACODE103,AC103,COMM,3,CRITICAL
ACODE104,AC104,COMM,4,THISHASBEENCHANGED
ACODE105,AC105,COMM,5,CRITICAL
ACODE106,AC106,COMM,6,CRITICAL

如果在 old.csv 和 new.csv 之间已修改/更改的行中的一列中有数据,则应将整行附加到 changes.csv 中,就像 old.csv 中的每一列和new.csv 并排:

在此处输入图像描述

我知道如何在 csv 中查找新项目和已删除项目,但不知道如何获取修改后的项目。下面的代码:

import csv

def DeletedItems(old_csv, new_csv, changes_csv):
    with open(new_csv, newline="", encoding="utf8") as new_fp:
        csv_reader = csv.reader(new_fp)
        csv_headings = next(csv_reader)
        new_long_names = {row[0] for row in csv.reader(new_fp)}
    with open(old_csv, newline="", encoding="utf8") as old_fp:
        with open(changes_csv, "a", newline="", encoding="utf8") as changes_fp:
            writer = csv.writer(changes_fp)
            writer.writerow("")
            for row in csv.reader(old_fp):
                if row[0] not in new_long_names:
                    writer.writerow(row)

def NewItems(old_csv, new_csv, changes_csv):
    with open(old_csv, newline="", encoding="utf8") as old_fp:
        csv_reader = csv.reader(old_fp)
        csv_headings = next(csv_reader)
        old_long_names = {row[0] for row in csv.reader(old_fp)}
    with open(new_csv, newline="", encoding="utf8") as new_fp:

        with open(changes_csv, "w", newline="", encoding="utf8") as changes_fp:
            writer = csv.writer(changes_fp)
            for row in csv.reader(new_fp):
                if row[0] not in old_long_names:
                    writer.writerow(row)

NewItems("old.csv", "new.csv", "changes.csv")
DeletedItems("old.csv", "new.csv", "changes.csv")

标签: pythonpython-3.xcsv

解决方案


首先,将两个 CSV 文件读入字典,使用longName值作为键。

import csv

with open(old_csv_file, "r") as fh:
    reader = csv.reader(fh)
    old_csv = {row[0]: row for row in reader}

with open(new_csv_file, "r") as fh:
    reader = csv.reader(fh)
    new_csv = {row[0]: row for row in reader}

然后,使用集合操作很容易找到新添加和删除的键。

old_longNames = set(old_csv.keys())
new_longNames = set(new_csv.keys())

# common: set intersection
common_longNames = old_longNames.intersection(new_longNames)
# removed: whatever's in old but not in new
removed_longNames = old_longNames - new_longNames
# added: whatever's in new but not in old
added_longNames = new_longNames - old_longNames

最后,遍历公共集以查找发生更改的位置:

changed_longNames = []
for key in common_longNames:
    old_row = old_csv[key]
    new_row = new_csv[key]
    # if any(o != n for o, n in zip(old_row, new_row)):
    if old_row != new_row:
        # this row has at least one column changed. Do whatever
        print(f"LongName {key} has changes")
        changed_longNames.append(key)

或者,作为列表理解:

changed_longNames = [key for key in common_longNames if old_csv[key] != new_csv[key]]

将所有内容写入新的 csv 文件也相当简单。请注意,集合不保留顺序,因此您可能无法以相同的顺序获得结果。

with open("deleted.csv", "w") as fh:
    writer = csv.writer(fh)
    for key in removed_longNames:
        writer.writerow(old_csv[key])

with open("inserted.csv", "w") as fh:
    writer = csv.writer(fh)
    for key in added_longNames:
        writer.writerow(new_csv[key])

with open("changed.csv", "w") as fh:
    writer = csv.writer(fh)
    for key in changed_longNames:
        old_row = old_csv[key]
        new_row = new_csv[key]
        merged_row = []
        for oi, ni in zip(old_row, new_row):
            merged_row.append(oi)
            merged_row.append(ni)
        writer.writerow(merged_row)


推荐阅读