首页 > 解决方案 > 如何从 fastq 文件中删除 SeqRecord 对象

问题描述

我有一个解析的 fastq 文件,我正在对读取进行一些操作。具体来说,我正在尝试确定我的 fastq 文件是否包含属于微生物污染的读数,而不是我的人类样本。因此,如果我的读数是污染,我需要将其从我的 fastq 文件中删除,以便只有属于人类样本的读数。我试过这个;

for record_seq in SeqIO.parse("file.fastq","fastq"):
    if condition==T:
        record_seq=""


But with this code i wasnt deleting my record, cause i had the same records counts in the final file.
So i thinked about the method pop , but i can not use it due to it is only for lists, and record_seq is a SeqRecord object... any ideas? Thanks!

标签: pythonparsingbiopythonfastafastq

解决方案


测试fasta文件:

@seq1
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
@seq2
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
@seq3
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

代码:

from Bio import SeqIO

if __name__ == "__main__":
  # Read sequence and store, based on condition
  seqs = [seq for seq in SeqIO.parse("test.fastq", "fastq") if seq.name != "seq2"]
  # Overwrite file
  with open("test.fastq", "w") as fh:
    SeqIO.write(seqs, fh, "fastq")

最终文件:

@seq1
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
@seq3
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

推荐阅读