python - 如何从文件读取中的唯一匹配项中正确清除列表-> 文件写入
问题描述
我有一系列文件,我正在使用 re.search() 函数为其提取字符串。
我要写入的文件是一个制表符分隔的列表,其中包括名称、hsa 和 matSEQID 以及 matACC。matSEQID 和 matACC 都是一个 commadelimited 列表,我使用 ','.join() 函数创建它。
我正在努力解决如何正确写入第一行的新文件,其中仅包括从第一个文件中提取的值。目前,名称和 hsa 列是正确的,但其他两列要么是所有文件中的每一个匹配项,要么是最后一个文件中的匹配项,具体取决于我清除列表的方式。如何使行独一无二?我试图在每个文件之后清除列表,但这似乎无法正常工作。我是否正确地考虑了这一点?谢谢
import re
import os
import sys
List_of_names = []
List_of_hsa = []
List_of_matSEQID = []
List_of_matACC = []
fileList = []
path = "/blah/blah/blah/stuff/"
dirs = os.listdir(path)
for file in dirs:
fileList.append(path+file)
outname = sys.argv[1]
output_fhandle = open(outname, "w")
def linesplitter(M_object):
Temp_storage = M_object.group()
new_storage1 = Temp_storage.split(">")
new_storage2 = new_storage1[1].split("<")
List_of_names.append(new_storage2[0])
def stem_patternID(M_object):
string = M_object.group()
List_of_hsa.append(string)
def mat_seqID(M_object):
string = M_object.group()
List_of_matSEQID.append(string)
for file in fileList:
fh_html = open(file).readlines()
for line in fh_html:
temp_string = line
match_object = re.search(acc_pattern, temp_string)
match_obj2 = re.search(stem_pattern, temp_string)
match_obj3 = re.search(mature_seq_pattern, temp_string)
match_obj4 = re.search(mature_acc_pattern, temp_string)
if (match_object):
linesplitter(match_object)
if(match_obj2):
stem_patternID(match_obj2)
if(match_obj3):
mat_seqID(match_obj3)
if(match_obj4):
mature_acc_func(match_obj4)
my_matseqid_string = ','.join(List_of_matSEQID)
mat_acc_string = ','.join(List_of_matACC)
with open(outname,"w") as f:
for (name,hsa) in zip(List_of_names, List_of_hsa):
f.write("{0}\t{1}\t{2}\t{3}\n".format(name,hsa,my_matseqid_string,mat_acc_string))
List_of_matSEQID.clear()
List_of_matACC.clear()
如果我在最后删除 .clear() 函数,那么每个匹配项都包含在该列表中。如果我保留它,那么我只会从最后一个文件中获得匹配项。如何使它打印到第 2 列和第 3 列中的文件列表,其中的值对于该给定文件是唯一的?谢谢
这是两个错误的输出文件: 1) .clear() 保持原样。
MI0023620 Stem-loop sequence hsa-mir-7159 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023613 Stem-loop sequence hsa-mir-7153 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023562 Stem-loop sequence hsa-mir-6077-2 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023619 Stem-loop sequence hsa-mir-7161 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023616 Stem-loop sequence hsa-mir-7156 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023612 Stem-loop sequence hsa-mir-7152 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023565 Stem-loop sequence hsa-mir-6511a-3 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023622 Stem-loop sequence hsa-mir-486-2 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023431 Stem-loop sequence hsa-mir-6511b-2 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023611 Stem-loop sequence hsa-mir-7151 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023618 Stem-loop sequence hsa-mir-7158 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023563 Stem-loop sequence hsa-mir-6089-2 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023564 Stem-loop sequence hsa-mir-6511a-2 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023561 Stem-loop sequence hsa-mir-3690-2 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023623 Stem-loop sequence hsa-mir-7162 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023610 Stem-loop sequence hsa-mir-7150 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023614 Stem-loop sequence hsa-mir-7154 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023615 Stem-loop sequence hsa-mir-7155 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023617 Stem-loop sequence hsa-mir-7157 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023566 Stem-loop sequence hsa-mir-6511a-4 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
MI0023621 Stem-loop sequence hsa-mir-7160 Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028230,MIMAT0028231
2) 删除了 .clear() 的前 3 行
MI0023620 Stem-loop sequence hsa-mir-7159 Mature sequence hsa-miR-7159-5p,Mature sequence hsa-miR-7159-3p,Mature sequence hsa-miR-7153-5p,Mature sequence hsa-miR-7153-3p,Mature sequence hsa-miR-6077,Mature sequence hsa-miR-7161-5p,Mature sequence hsa-miR-7161-3p,Mature sequence hsa-miR-7156-5p,Mature sequence hsa-miR-7156-3p,Mature sequence hsa-miR-7152-5p,Mature sequence hsa-miR-7152-3p,Mature sequence hsa-miR-6511a-5p,Mature sequence hsa-miR-6511a-3p,Mature sequence hsa-miR-486-5p,Mature sequence hsa-miR-486-3p,Mature sequence hsa-miR-6511b-5p,Mature sequence hsa-miR-6511b-3p,Mature sequence hsa-miR-7151-5p,Mature sequence hsa-miR-7151-3p,Mature sequence hsa-miR-7158-5p,Mature sequence hsa-miR-7158-3p,Mature sequence hsa-miR-6089,Mature sequence hsa-miR-6511a-5p,Mature sequence hsa-miR-6511a-3p,Mature sequence hsa-miR-3690,Mature sequence hsa-miR-7162-5p,Mature sequence hsa-miR-7162-3p,Mature sequence hsa-miR-7150,Mature sequence hsa-miR-7154-5p,Mature sequence hsa-miR-7154-3p,Mature sequence hsa-miR-7155-5p,Mature sequence hsa-miR-7155-3p,Mature sequence hsa-miR-7157-5p,Mature sequence hsa-miR-7157-3p,Mature sequence hsa-miR-6511a-5p,Mature sequence hsa-miR-6511a-3p,Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028228,MIMAT0028229,MIMAT0028216,MIMAT0028217,MIMAT0023702,MIMAT0028232,MIMAT0028233,MIMAT0028222,MIMAT0028223,MIMAT0028214,MIMAT0028215,MIMAT0025478,MIMAT0025479,MIMAT0002177,MIMAT0004762,MIMAT0025847,MIMAT0025848,MIMAT0028212,MIMAT0028213,MIMAT0028226,MIMAT0028227,MIMAT0023714,MIMAT0025478,MIMAT0025479,MIMAT0018119,MIMAT0028234,MIMAT0028235,MIMAT0028211,MIMAT0028218,MIMAT0028219,MIMAT0028220,MIMAT0028221,MIMAT0028224,MIMAT0028225,MIMAT0025478,MIMAT0025479,MIMAT0028230,MIMAT0028231
MI0023613 Stem-loop sequence hsa-mir-7153 Mature sequence hsa-miR-7159-5p,Mature sequence hsa-miR-7159-3p,Mature sequence hsa-miR-7153-5p,Mature sequence hsa-miR-7153-3p,Mature sequence hsa-miR-6077,Mature sequence hsa-miR-7161-5p,Mature sequence hsa-miR-7161-3p,Mature sequence hsa-miR-7156-5p,Mature sequence hsa-miR-7156-3p,Mature sequence hsa-miR-7152-5p,Mature sequence hsa-miR-7152-3p,Mature sequence hsa-miR-6511a-5p,Mature sequence hsa-miR-6511a-3p,Mature sequence hsa-miR-486-5p,Mature sequence hsa-miR-486-3p,Mature sequence hsa-miR-6511b-5p,Mature sequence hsa-miR-6511b-3p,Mature sequence hsa-miR-7151-5p,Mature sequence hsa-miR-7151-3p,Mature sequence hsa-miR-7158-5p,Mature sequence hsa-miR-7158-3p,Mature sequence hsa-miR-6089,Mature sequence hsa-miR-6511a-5p,Mature sequence hsa-miR-6511a-3p,Mature sequence hsa-miR-3690,Mature sequence hsa-miR-7162-5p,Mature sequence hsa-miR-7162-3p,Mature sequence hsa-miR-7150,Mature sequence hsa-miR-7154-5p,Mature sequence hsa-miR-7154-3p,Mature sequence hsa-miR-7155-5p,Mature sequence hsa-miR-7155-3p,Mature sequence hsa-miR-7157-5p,Mature sequence hsa-miR-7157-3p,Mature sequence hsa-miR-6511a-5p,Mature sequence hsa-miR-6511a-3p,Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028228,MIMAT0028229,MIMAT0028216,MIMAT0028217,MIMAT0023702,MIMAT0028232,MIMAT0028233,MIMAT0028222,MIMAT0028223,MIMAT0028214,MIMAT0028215,MIMAT0025478,MIMAT0025479,MIMAT0002177,MIMAT0004762,MIMAT0025847,MIMAT0025848,MIMAT0028212,MIMAT0028213,MIMAT0028226,MIMAT0028227,MIMAT0023714,MIMAT0025478,MIMAT0025479,MIMAT0018119,MIMAT0028234,MIMAT0028235,MIMAT0028211,MIMAT0028218,MIMAT0028219,MIMAT0028220,MIMAT0028221,MIMAT0028224,MIMAT0028225,MIMAT0025478,MIMAT0025479,MIMAT0028230,MIMAT0028231
MI0023562 Stem-loop sequence hsa-mir-6077-2 Mature sequence hsa-miR-7159-5p,Mature sequence hsa-miR-7159-3p,Mature sequence hsa-miR-7153-5p,Mature sequence hsa-miR-7153-3p,Mature sequence hsa-miR-6077,Mature sequence hsa-miR-7161-5p,Mature sequence hsa-miR-7161-3p,Mature sequence hsa-miR-7156-5p,Mature sequence hsa-miR-7156-3p,Mature sequence hsa-miR-7152-5p,Mature sequence hsa-miR-7152-3p,Mature sequence hsa-miR-6511a-5p,Mature sequence hsa-miR-6511a-3p,Mature sequence hsa-miR-486-5p,Mature sequence hsa-miR-486-3p,Mature sequence hsa-miR-6511b-5p,Mature sequence hsa-miR-6511b-3p,Mature sequence hsa-miR-7151-5p,Mature sequence hsa-miR-7151-3p,Mature sequence hsa-miR-7158-5p,Mature sequence hsa-miR-7158-3p,Mature sequence hsa-miR-6089,Mature sequence hsa-miR-6511a-5p,Mature sequence hsa-miR-6511a-3p,Mature sequence hsa-miR-3690,Mature sequence hsa-miR-7162-5p,Mature sequence hsa-miR-7162-3p,Mature sequence hsa-miR-7150,Mature sequence hsa-miR-7154-5p,Mature sequence hsa-miR-7154-3p,Mature sequence hsa-miR-7155-5p,Mature sequence hsa-miR-7155-3p,Mature sequence hsa-miR-7157-5p,Mature sequence hsa-miR-7157-3p,Mature sequence hsa-miR-6511a-5p,Mature sequence hsa-miR-6511a-3p,Mature sequence hsa-miR-7160-5p,Mature sequence hsa-miR-7160-3p MIMAT0028228,MIMAT0028229,MIMAT0028216,MIMAT0028217,MIMAT0023702,MIMAT0028232,MIMAT0028233,MIMAT0028222,MIMAT0028223,MIMAT0028214,MIMAT0028215,MIMAT0025478,MIMAT0025479,MIMAT0002177,MIMAT0004762,MIMAT0025847,MIMAT0025848,MIMAT0028212,MIMAT0028213,MIMAT0028226,MIMAT0028227,MIMAT0023714,MIMAT0025478,MIMAT0025479,MIMAT0018119,MIMAT0028234,MIMAT0028235,MIMAT0028211,MIMAT0028218,MIMAT0028219,MIMAT0028220,MIMAT0028221,MIMAT0028224,MIMAT0028225,MIMAT0025478,MIMAT0025479,MIMAT0028230,MIMAT0028231