python - 在 fasta 文件中更改反向序列的方向不起作用
问题描述
我正在尝试在文件中正确定位反向序列。这是代码:
import os
import sys import pysam
from Bio import SeqIO, Seq, SeqRecord
def main(in_file):
out_file = "%s.fa" % os.path.splitext(in_file)[0]
with open(out_file, "w") as out_handle:
# Write records from the BAM file one at a time to the output file.
# Works lazily as BAM sequences are read so will handle large files.
SeqIO.write(bam_to_rec(in_file), out_handle, "fasta")
def bam_to_rec(in_file):
"""Generator to convert BAM files into Biopython SeqRecords.
"""
bam_file = pysam.Samfile(in_file, "rb")
for read in bam_file:
seq = Seq.Seq(read.seq)
if read.is_reverse:
seq = seq.reverse_complement()
rec = SeqRecord.SeqRecord(seq, read.qname, "", "")
yield rec
if __name__ == "__main__":
main(*sys.argv[1:])`
当我打印出反向序列时,代码有效。但是当在文件中时,它会以相反的顺序打印出来。谁能帮我找出问题所在?这是我的 infile 的链接: https ://www.dropbox.com/sh/68ui8l7nh5fxatm/AABUr82l01qT1nL8I_XgJaeTa?dl=0
解决方案
请注意,丑陋的计数器只是打印 10000 个序列,而不是更多。
比较一个没有反转的和一个反转的如果需要这是几个seq的输出,随意测试它,我认为你的问题是yield返回一个迭代器但你没有迭代它,除非我误解了你是什么正在做:
原来的:
SOLEXA-1GA-2:2:93:1281:961#0 GGGTTAGGTTAGGGTTAGGGTTAGGGTTAGGGTTAG
变成:
SOLEXA-1GA-2:2:93:1281:961#0 CTAACCCTAACCCTAACCCTAACCCTAACCTAACCC
如果不反转:
原来的:
SOLEXA-1GA-2:2:12:96:1547#0 ACACACAAACACACACACACACACACACACACCCCC
变成:
SOLEXA-1GA-2:2:12:96:1547#0 ACACACAAACACACACACACACACACACACACCCCC 这是我的代码:
import os
import sys
import pysam
from Bio import SeqIO, Seq, SeqRecord
def main(in_file):
out_file = "%s.fa" % os.path.splitext(in_file)[0]
with open('test_non_reverse.txt', 'w') as non_reverse:
with open(out_file, "w") as out_handle:
# Write records from the BAM file one at a time to the output file.
# Works lazily as BAM sequences are read so will handle large files.
i = 0
for s in bam_to_rec(in_file):
if i == 10000:
break
i +=1
SeqIO.write(s, out_handle, "fasta")
i = 0
for s in convert_to_seq(in_file):
if i == 10000:
break
i +=1
SeqIO.write(s, non_reverse, 'fasta')
def convert_to_seq(in_file):
bam_file = pysam.Samfile(in_file, "rb")
for read in bam_file:
seq = Seq.Seq(read.seq)
rec = SeqRecord.SeqRecord(seq, read.qname, "", "")
yield rec
def bam_to_rec(in_file):
"""Generator to convert BAM files into Biopython SeqRecords.
"""
bam_file = pysam.Samfile(in_file, "rb")
for read in bam_file:
seq = Seq.Seq(read.seq)
if read.is_reverse:
seq = seq.reverse_complement()
rec = SeqRecord.SeqRecord(seq, read.qname, "", "")
yield rec
if __name__ == "__main__":
main(*sys.argv[1:])
推荐阅读
- spring-batch - Spring Batch 步骤/作业依赖
- scala - 未找到 SQL Server 依赖项
- c++ - c ++ string::size中的CharT元素是什么?
- join - 在数据集上使用什么是正确的连接?
- php - 使用 Curl 调用 Slim CORS Web 服务
- c# - 将 varchar 数据类型转换为 datetime 数据类型导致 c# 中的值超出范围
- python - Pandas Dataframe 将数据拼接成 2 列,并用逗号和整数组成一个数字
- python - 在python中搜索具有搜索词的文件
- mysql - mySQL - 比较两个表中的值是否相等
- c# - 在 C# 中处理 WM_PAINT 时防止闪烁