python-3.x - 如何在 DNA 字符串行中找到 A 的最长现有子字符串并返回长度?
问题描述
我遇到了以下问题,不知道如何解决。问题是找到重复 A 的最长子串的长度,并返回此列表中每个字符串的长度值:
['>KF735813.1 HIV-1 isolate Cameroon1(ViroSeq) HIV DR 02 from Cameroon pol protein (pol) gene, partial cds', 'CCTCAAATCACTCTTTGGCAACGACCCTTAGTCACAGTTAGGATAGAGGGACAGTTAATAGAAGCCCTATTAGACACAGG', 'GGCAGATGATACAGTATTAGAAGAGATAAATTTACCAGGAAAATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTA', 'TCAAAGTAAGACAGTATGATCAGATACTTATAGAAATTTGTGGAAAAAGGGCCATAGGTACAGTATTAGTAGGACCTACA', 'CCTGTCAACATAATTGGACGAAACATGTTGACTCAGATTGGTTGTACTTTAAATTTTCCAATTAGTCCTATTGAAACTGT', 'GCCAGTAAAATTAAAGCCAGGTATGGATGGCCCAAAGGTAAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAA', 'CAGAAATTTGTACAGATATGGAAAAAGGAGGGAAAAATTTCAATAATTGGGCCTGAAAATCCATATAATACTCCAGTATTT', 'GCCATAAAGAAAAAAGATAGTACTAAATGGAGAAAATTAGTAGATTTTAGAGAACTTAATAAGAGAACTCAAGACTTCTG', 'GGAGATCCAATTAGGAATACCTCATCCCGCGGGATTAAAAAAGAACAAATCAGTAACAGTACTAGATGTGGGGGATGCAT', 'ATTTTTCAGTTCCCTTAGATTAAGACTTTAGAAAGTACACTGCATTCACTATACCTAGTTTAAATAATGCAACACCAGGT', 'ATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTTCAGGCAAGCATGACAAAAATCTT', 'AGAGCCCTTTAGGACAAAATATCCAGAAATAGTGATCTACCAATATATGGATGATTTATATGTAGGATCAGACTTAGAGA', 'TAGGGCAGCATAGAGCAAAAATAGAGGAGTTGAGAGTACATCTATTGAAGTGGGGATTTACCACACCAGACAAAAAACAT', 'CAGAAAGAACCTCCATTTCTTTGGATGGGATATGAACTCCATCCTGACAAATGGACAGTCCAGCCTATACAGCTGCCAGA', 'AAAAGACAGCTGGACTGTCAATGATATACAGAAATTAGTGGGAAAACTAAATTGGGCAAGTCAGATTTATGCAGGAATTA', 'AAGTAAAGCAACTGTGTAGACTCCTCAGGGGAGCCAAAGCACTAACAGAGGTAGTACCACTAACTGAGGAAGCAGAATTA', 'GAATTGGCAGATAACAGGGAGATTCTAAAAGAACCTGTACATGGAGTATATTATGACCCAACAAAAGACTTAGTAGCAGA', 'AATACAGAAGCAAGGGCAAGAC']
这是我尝试执行的功能,但我知道这是错误的方法:
for c in range(len(fastarec_Lines)):
if fastarec_Lines[c].count('A') == current:
count += 1
else:
count = 1
current = fastarec_Lines[c]
maximum = max(count,maximum)
return maximum
有人可以帮我吗 ?
解决方案
一种方法是对模式进行正则表达式查找所有搜索A+
。然后,根据长度对结果字符串进行排序,并打印出最后一个元素:
seq = "AATTGGCCAAAAATTGCA"
matches = re.findall(r'A+', seq)
matches.sort(lambda x,y: cmp(len(x), len(y)))
print("longest string is " + matches[-1] + " with a length of " + str(len(matches[-1])))
这打印:
longest string is AAAAA with a length of 5
推荐阅读
- c# - PrintManager.PrintToFileName 中的文件名无效
- c# - 在 linux 上使用 docfx.console nuget 包
- r - ggplot不绘制零和缺失值
- flutter - How to animate images on mouse hover using Flutter for Web?
- python - Pros and cons of using shared Value/Array vs Queue/Pipe in Python multiprocessing
- python - 如何将一个元素列表转换为浮点数
- node.js - DynamoDB 返回未定义但在 console.log() 中记录数据,异步问题?
- html - 文字从卡片中出来并覆盖图像
- r - 循环我在 R 中的数据框列表上创建的函数
- svn - svn update 不替换已删除 --- 但未签入 --- 文件