首页 > 技术文章 > 求fasta文件中互补序列

nklzj 2017-05-13 19:26 原文

一个名为read_1.fa 的fasta文件,里面有若干序列,如:

>@r1
TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG
>@r2
NTTNTGATGCGGGCTTGTGGAGTTCAGCCGATCTGACTTATGTCATTACCTATGAAATGTGAGGACGCTATGCCTGTACCAAATCCTACAATGCCGGTGAAAGGTGCCGGGATCACCCTGTGGGTTTAT
>@r3
ATCGCCCGCAGACACCTTCACGCTGGACTGTTTCGGCTTTTACAGCGTCGCTTCATAATCCTTTTTCGCCGCCGCCATCAGCGTGTTGTAATCCGCCTGCAGGATTTTCCCGTCTTTCNGTGCCTTGNT
>@r4
GGGCCAATGCGCTTACTGATGCGGAATTACGCCGTAAGGCCGCAGATGAGCTTGTCCATATGACTGCGAGAATTAACNGTGGTGAGGCGATCCCTGAACCAGTAAAACAACTTCCTGTCATGGGCGGTA
>@r5
GTCAGGAAAGTGGTAAAACTGCAACTCAATTACTGCAATGCCCTCGTAATTAAGTGAATTTACAATATCGTCCTGTTCGGAGGGAAGAACGCGGGATGTTCATTCTTCATCACTTTTAATTGATGTATA
>@r6
AGCGACATTCTTCCTCGGTACATAATCTCCTTTGGCGTTTCCCGATGNCCGTCACGCACATGGNATCCCGTGATGACCTCATTAAAAACACGCTGCAATCCCTCCTCATCTTTGCAGGCGTCCGATTTT
>@r7
CCCCGCCACCATCCCGCCGGGCNTGTCCATATCGAGCAGAATGCTGTCCACCATCGGATCGCTGGCAGCCTGTTGCAGACGGGCGATAATGCCGTTGTAACCGGTCATCCCCGAGTACGGCTGCAGCGC
>@r8
NTGAACAGTAAACGTCTGTTGAGCACATCCTTTAATAAGCAGGGCCAGCGCAGTATCNAGTAGCATATTTTTCATGGTGTTATTCCCGATGCTTTTTG
>@r9
CCCGATGCTTTTTGAAGTTCGCAGAATCGTATGTGTAGANAATTAAACAAANCCT
..........等等

complement_seq.py代码如下:

#encoding = utf-8

"""
简介:求fasta文件中每个序列的互补序列
作者:刘自军
date:2017年5月18:54
"""
import sys
from collections import OrderedDict

args = sys.argv

seq = OrderedDict()
tmp_dit = {'A':'T','G':'C','C':'G','T':'A','N':'N'}

with open(args[1]) as f:

    for line in f:
        
        line = line.strip('\n')
        if line.startswith('>'):
            seq_id = line
            seq[seq_id] = ''
        else:
            for i in line:
                seq[seq_id] += tmp_dit[i]

for id,com_seq in seq.items():
    print ('%s\n%s' %(id,com_seq))

python complement_seq.py read_1.fa

或者python complement_seq.py read_1.fa > com_read.fa

 

推荐阅读