首页 > 解决方案 > 如何在 DNA 字符串行中找到 A 的最长现有子字符串并返回长度?

问题描述

我遇到了以下问题,不知道如何解决。问题是找到重复 A 的最长子串的长度,并返回此列表中每个字符串的长度值:

['>KF735813.1 HIV-1 isolate Cameroon1(ViroSeq) HIV DR 02 from Cameroon pol protein (pol) gene, partial cds', 'CCTCAAATCACTCTTTGGCAACGACCCTTAGTCACAGTTAGGATAGAGGGACAGTTAATAGAAGCCCTATTAGACACAGG', 'GGCAGATGATACAGTATTAGAAGAGATAAATTTACCAGGAAAATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTA', 'TCAAAGTAAGACAGTATGATCAGATACTTATAGAAATTTGTGGAAAAAGGGCCATAGGTACAGTATTAGTAGGACCTACA', 'CCTGTCAACATAATTGGACGAAACATGTTGACTCAGATTGGTTGTACTTTAAATTTTCCAATTAGTCCTATTGAAACTGT', 'GCCAGTAAAATTAAAGCCAGGTATGGATGGCCCAAAGGTAAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAA', 'CAGAAATTTGTACAGATATGGAAAAAGGAGGGAAAAATTTCAATAATTGGGCCTGAAAATCCATATAATACTCCAGTATTT', 'GCCATAAAGAAAAAAGATAGTACTAAATGGAGAAAATTAGTAGATTTTAGAGAACTTAATAAGAGAACTCAAGACTTCTG', 'GGAGATCCAATTAGGAATACCTCATCCCGCGGGATTAAAAAAGAACAAATCAGTAACAGTACTAGATGTGGGGGATGCAT', 'ATTTTTCAGTTCCCTTAGATTAAGACTTTAGAAAGTACACTGCATTCACTATACCTAGTTTAAATAATGCAACACCAGGT', 'ATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTTCAGGCAAGCATGACAAAAATCTT', 'AGAGCCCTTTAGGACAAAATATCCAGAAATAGTGATCTACCAATATATGGATGATTTATATGTAGGATCAGACTTAGAGA', 'TAGGGCAGCATAGAGCAAAAATAGAGGAGTTGAGAGTACATCTATTGAAGTGGGGATTTACCACACCAGACAAAAAACAT', 'CAGAAAGAACCTCCATTTCTTTGGATGGGATATGAACTCCATCCTGACAAATGGACAGTCCAGCCTATACAGCTGCCAGA', 'AAAAGACAGCTGGACTGTCAATGATATACAGAAATTAGTGGGAAAACTAAATTGGGCAAGTCAGATTTATGCAGGAATTA', 'AAGTAAAGCAACTGTGTAGACTCCTCAGGGGAGCCAAAGCACTAACAGAGGTAGTACCACTAACTGAGGAAGCAGAATTA', 'GAATTGGCAGATAACAGGGAGATTCTAAAAGAACCTGTACATGGAGTATATTATGACCCAACAAAAGACTTAGTAGCAGA', 'AATACAGAAGCAAGGGCAAGAC']

这是我尝试执行的功能,但我知道这是错误的方法:

 for c in range(len(fastarec_Lines)):
        if fastarec_Lines[c].count('A') == current:
            count += 1
        else:
            count = 1
            current = fastarec_Lines[c]
    maximum = max(count,maximum)
    return maximum

有人可以帮我吗 ?

标签: python-3.xstringlistdna-sequence

解决方案


一种方法是对模式进行正则表达式查找所有搜索A+。然后,根据长度对结果字符串进行排序,并打印出最后一个元素:

seq = "AATTGGCCAAAAATTGCA"
matches = re.findall(r'A+', seq)
matches.sort(lambda x,y: cmp(len(x), len(y)))
print("longest string is " + matches[-1] + " with a length of " + str(len(matches[-1])))

这打印:

longest string is AAAAA with a length of 5

推荐阅读