首页 > 解决方案 > 按某些列比较两个文本文件然后返回整行?

问题描述

因此,我尝试对此进行尝试,并了解我想做的事情的概念,但在执行它时遇到了麻烦。所以基本上我正在比较两个有 4 列的文本文件(new1 和 new2)。最后一列是日期列。我想查看 new2 中不在 new1 中的条目(加法和减法)。

所以让我们说 new1 说:

John 1234 AccountA 10/11/2019
Max 3456 AccountA 10/11/2019
Stuart 8769 AccountA 10/11/2019

new2 说:

John 1234 AccountA 10/12/2019
Milton 0011 AccountB 10/12/2019

new3 或 newoutput 应该是:

- Max 3456 AccountA 10/11/2019
- Stuart 8769 AccountA 10/11/2019
+ Milton 0011 AccountB 10/12/2019

请注意,即使日期不同,每个文件的第一个条目也不应注册为差异。我基本上想比较每个文件的三列,然后打印出整行。下面的代码:

#Open text1, read, make a set, read through the file and separate the lines by tabs, only target columns 0-3
 f1=open("new1.txt", "r")
 lines = f1.readlines()
 result=set()
 full_line = set()
 for x in lines:
     result.add(str(x.split("\t")[0:3])) #set of the lines first few columns
     full_line.add(str(x.split("\t")[0:4])) #set of lines all columns (full line)



 #Open text2, read, make a set, read through the file and separate the lines by tabs, only target columns 0-3
 f2=open("new2.txt", "r")
 lines2 = f2.readlines()
 result2=set()
 full_line2 = set()
 for x2 in lines2:
     result2.add(str(x2.split("\t")[0:3])) #set of the lines first few columns
     full_line2.add(str(x2.split("\t")[0:4])) #set of lines all columns (full line)

 newlines = set(result2).difference(set(result)) #set of new2 - set of new1 - additions to new2
 missinglines = set(result).difference(set(result2)) # set of new1 - set of new2 - subtractions from new1

 for diffs in newlines:
     print ("+ " + diffs + full_line[4])
 for missings in missinglines:
print ("- " + missings + full[line2[4]])   

我知道这段代码的最后一部分不起作用,因为我无法通过一组索引,但主要思想就在那里。有人可以帮忙吗?

标签: python

解决方案


import csv

new1, new2 = {}, {}  # let's track the lines in each file

with open('new1') as fin:
    infile = csv.reader(fin, delimiter=' ')
    next(infile)
    for *key,date in infile:  # use the first three columns as the key
        new1[tuple(key)] = date  # we'll need the date later

with open('new2') as fin:
    infile = csv.reader(fin, delimiter='\t')
    next(infile)
    for *key,date in infile:
        new2[tuple(key)] = date

with open('output', 'w') as outfile:
    for k in (k for k in new1 if k not in new2):  # the keys in new1, but not in new2
        outfile.write('-' + '\t'.join(list(k) + [new1[k]]) + '\n')  # add the date, write out with tabls
    for k in (k for k in new2 if k not in new1):
        outfile.write('+' + '\t'.join(list(k) + [new2[k]]) + '\n')

推荐阅读