首页 > 解决方案 > 在字典中拆分字符串和翻译字符的问题 - 生物信息学 OOP

问题描述

我的程序有问题。这部分代码是有问题的。

    def revcmpl(self):
        
        # TODO:convert sequence contained in the object
        #      to a list called seq
        
        seq = list(self.seq)
        
        # TODO: reverse the list in-place
        
        seq.reverse()
        
        # TODO: using string method join(), the class dictionary ALPH and a
        #       list comprehension, translate the reversed sequence and
        #       convert into a string
        
        seq = list(seq)
        seq_revcmpl = ''.join(DNASeq.ALPH[key] for key in self.seq.split())
        seq_revcmpl = str(seq_revcmpl)
        
        # TODO: create seqid variable and assign to it the object's seqid
        #       and the suffix '_revcmpl'
        
        seqid = f'{self.seqid}_revcmpl'
        
        # TODO: create a new object od DNASeq type using the new seqid,
        #       title contained in the object and
        #       reveresed and translated sequence,
        #       return the new object
        
        obj1 = DNASeq(seqid, title, seq_revcmpl)

        return obj1

我尝试使用字符串方法 join()、类字典 ALPH 和列表推导,翻译反向序列并转换为字符串。我尝试运行这个:

# reload the sequences to have a collection of objects
# that are instances of the up-to-date DNASeq class

seqs = DNASeq.from_file('input/Staphylococcus_MLST_genes.fasta')

# select one of the sequences by its sequence id (seqid)
seq = seqs['yqiL']

new_seq = seq.revcmpl()

print( new_seq )

但我得到一个错误

KeyError                                  Traceback (most recent call last)
<ipython-input-57-a28b468b9cfe> in <module>
      7 seq = seqs['yqiL']
      8 
----> 9 new_seq = seq.revcmpl()
     10 
     11 print( new_seq )

<ipython-input-43-07d175957482> in revcmpl(self)
    211 
    212         seq = list(seq)
--> 213         seq_revcmpl = ''.join(DNASeq.ALPH[key] for key in self.seq.split())
    214         seq_revcmpl = str(seq_revcmpl)
    215 

<ipython-input-43-07d175957482> in <genexpr>(.0)
    211 
    212         seq = list(seq)
--> 213         seq_revcmpl = ''.join(DNASeq.ALPH[key] for key in self.seq.split())
    214         seq_revcmpl = str(seq_revcmpl)
    215 

KeyError: 'GCGTTTAAAGACGTGCCAGCCTATGATTTAGGTGCGACTTTAATAGAACATATTATTAAAGAGACGGGTTTGAATCCAAGTGAGATTGATGAAGTTATCATCGGTAACGTACTACAAGCAGGACAAGGACAAAATCCAGCACGAATTGCTGCTATGAAAGGTGGCTTGCCAGAAACAGTACCTGCATTTACAGTGAATAAAGTATGTGGTTCTGGGTTAAAGTCGATTCAATTAGCATATCAATCTATTGTGACTGGTGAAAATGACATCGTGCTAGCTGGCGGTATGGAGAATATGTCTCAGTCACCAATGCTTGTCAACAACAGTCGCTTCGGTTTTAAAATGGGACATCAATCAATGGTTGATAGCATGGTATATGATGGTTTAACAGATGTATTTAATCAATATCATATGGGTATTACTGCTGAAAATTTAGTGGAGCAATATGGTATTTCAAGAGAAGAACAAGATACATTTGCTGTAAACTCACAACAAAAAGCAGTACGTGCACAGCAA'

但为什么????我拆分了一个序列,seq_revcmpl = ''.join(DNASeq.ALPH[key] for key in self.seq.split())

标签: pythonoopsplitjupyter-notebookbioinformatics

解决方案


问题在这里:

seq_revcmpl = ''.join(DNASeq.ALPH[key] for key in self.seq.split())

self.seq将不包含任何空格,因此self.seq.split()将返回一个包含单个项目的列表 - 序列本身。

然后生成器表达式只有一次迭代(因为列表中只有一个项目,一个大字符串),并且key将是整个序列。

我想你想要的是:

seq_revcmpl = ''.join(DNASeq.ALPH[key] for key in self.seq)

推荐阅读