python - 按某些列比较两个文本文件然后返回整行?
问题描述
因此,我尝试对此进行尝试,并了解我想做的事情的概念,但在执行它时遇到了麻烦。所以基本上我正在比较两个有 4 列的文本文件(new1 和 new2)。最后一列是日期列。我想查看 new2 中不在 new1 中的条目(加法和减法)。
所以让我们说 new1 说:
John 1234 AccountA 10/11/2019
Max 3456 AccountA 10/11/2019
Stuart 8769 AccountA 10/11/2019
new2 说:
John 1234 AccountA 10/12/2019
Milton 0011 AccountB 10/12/2019
new3 或 newoutput 应该是:
- Max 3456 AccountA 10/11/2019
- Stuart 8769 AccountA 10/11/2019
+ Milton 0011 AccountB 10/12/2019
请注意,即使日期不同,每个文件的第一个条目也不应注册为差异。我基本上想比较每个文件的三列,然后打印出整行。下面的代码:
#Open text1, read, make a set, read through the file and separate the lines by tabs, only target columns 0-3
f1=open("new1.txt", "r")
lines = f1.readlines()
result=set()
full_line = set()
for x in lines:
result.add(str(x.split("\t")[0:3])) #set of the lines first few columns
full_line.add(str(x.split("\t")[0:4])) #set of lines all columns (full line)
#Open text2, read, make a set, read through the file and separate the lines by tabs, only target columns 0-3
f2=open("new2.txt", "r")
lines2 = f2.readlines()
result2=set()
full_line2 = set()
for x2 in lines2:
result2.add(str(x2.split("\t")[0:3])) #set of the lines first few columns
full_line2.add(str(x2.split("\t")[0:4])) #set of lines all columns (full line)
newlines = set(result2).difference(set(result)) #set of new2 - set of new1 - additions to new2
missinglines = set(result).difference(set(result2)) # set of new1 - set of new2 - subtractions from new1
for diffs in newlines:
print ("+ " + diffs + full_line[4])
for missings in missinglines:
print ("- " + missings + full[line2[4]])
我知道这段代码的最后一部分不起作用,因为我无法通过一组索引,但主要思想就在那里。有人可以帮忙吗?
解决方案
import csv
new1, new2 = {}, {} # let's track the lines in each file
with open('new1') as fin:
infile = csv.reader(fin, delimiter=' ')
next(infile)
for *key,date in infile: # use the first three columns as the key
new1[tuple(key)] = date # we'll need the date later
with open('new2') as fin:
infile = csv.reader(fin, delimiter='\t')
next(infile)
for *key,date in infile:
new2[tuple(key)] = date
with open('output', 'w') as outfile:
for k in (k for k in new1 if k not in new2): # the keys in new1, but not in new2
outfile.write('-' + '\t'.join(list(k) + [new1[k]]) + '\n') # add the date, write out with tabls
for k in (k for k in new2 if k not in new1):
outfile.write('+' + '\t'.join(list(k) + [new2[k]]) + '\n')
推荐阅读
- c - Minifilter 驱动程序 Windows 10 之前和包括 1809 和 1903 及更高版本之间的差异(在创建文件时调用 FileRenameInformation)
- laravel - 播种具有多个关系的表?
- documentation - 开发者文档生成器
- javascript - 恢复场景后如何防止精灵静止不动?移相器 3
- java - 用于检查 Internet 速度的 Java API(上传/下载)
- algorithm - 在动态编程中,我总是必须填满整个表格吗?
- r - 询问如何在 R 中使用 ggplot 制作世界热图?
- python - 类型提示特定格式的字符串?
- php - JSON解码数组到字符串转换错误
- multithreading - 使函数线程安全的类