首页 > 解决方案 > 比较两个 csv 文件

问题描述

import csv
with t1 = open('old.csv', 'r') as f1, t2 = open('new.csv', 'r') as f2:
    # skip headers
    next(f1),next(f2)d
    r1 = csv.reader(f1)
    # make set of strings matching format of file2
    st = set("{},{}".format(row[0], row[2]) for row in r1)
    # iterate over every line in file2
    # and check if the line appears in the set
    for line in f2:
        if line.rstrip() in st:
            print(line)

这段代码不是我想要的 CSV,我想要一个用于比较两个 csv 文件列的 python 代码,如果它们是相似的项目,我需要创建一个具有相似项目关联 ID 的 csv 文件。

标签: pythoncsv

解决方案


如果您可以使用pandas

import pandas as pd
df=pd.DataFrame({'sr':[1,2],'keywords':['keyword1','keyword2']}) # can read a csv1 using pd.read_csv('path_to_your_csv1')
df
    keywords    sr
0   keyword1    1
1   keyword2    2

加载另一个 csv 产品

df2=pd.DataFrame({'sr':[1,2],'product':['keyword1 is in this product','no key word present']})# can read a csv1 using pd.read_csv2('path_to_your_csv2')
df2
    product                    sr
0   keyword1 is in this product 1
1   no key word present         2

应用搜索条件

     product                   sr
0   keyword1 is in this product 1

您可以将生成的数据框导出到新的 csv。

df3=df2[df2['product'].str.contains("|".join(df.keywords))]
df3.to_csv('your_path_to_new_csv',index=False)

推荐阅读