首页 > 解决方案 > FASTA 文件中运行脚本中的 KeyError 'c'?

问题描述

我有一个基于 DNA 序列开发的脚本,但是当我导入一个断行的 FASTA 文件时,我收到以下错误:

KeyError                                  Traceback (most recent call last)
<ipython-input-44-af22d7835e7a> in <module>()
     30     pal = True
     31     for j in range(2):
---> 32         if pairs[ Mysequence[i+j] ] != Mysequence[i+17-j]:
     33             pal = False
     34             break

KeyError: 'c'

我的脚本如下:

def isheader(line):
    return line[0] == '>'

def aspairs(f):
    seq_id = ''
    sequence = ''
    for header,group in itertools.groupby(f, isheader):
        if header:
            line = next(group)
            seq_id = line[1:].split()[0]
        else:
            sequence = ''.join(line.strip() for line in group)
            yield seq_id, sequence

with open(file,"rt") as fh:
    seqs = aspairs(fh)
    for seqinfo in seqs:
        Mysequence = seqinfo[1].lower()
print(len(Mysequence))
pairs = {"A":"T", "T":"A", "G":"C", "C":"G"}
ans = []
for i in range(len(Mysequence) - 21 + 1):
    pal = True
    for j in range(2):
        if pairs[ Mysequence[i+j] ] != Mysequence[i+17-j]:
            pal = False
            break
    if not pal:
        continue

    if (Mysequence[i+19] == Mysequence[i+20]) and (Mysequence[i+19] in ('C', 'G')):
        print(Mysequence[i : i+21])
        ans.append(Mysequence[i : i+21])
    else:
        print(Mysequence[i : i+18] + " (X)")
print("Count of answer: %d" % len(ans))

你知道错误的原因是什么吗?谢谢

标签: python-3.x

解决方案


Mysequence因为你只包含小写字母seqinfo[1].lower(),但pairs字典只有大写字母作为键。因此,当您尝试从pairs使用小写字母中提取值时,Mysequence您会得到一个 KeyError。


推荐阅读