python - 尝试(但失败)将 2 个列表与蛋白质片段序列进行比较
问题描述
所以我写了这段脚本:
from Bio import SeqIO
from Bio import SeqUtils
protein = SeqIO.parse('short_protein.fasta', 'fasta')
id_protein = []
sequence_protein = 0
weight_protein = 0
for i in protein:
sequence_protein = (f'{i.seq}')
id_protein.append(f'{i.id}')
weight = SeqUtils.molecular_weight(i.seq, seq_type="protein", circular=True)
weight_protein += (round(weight, 1))
print(f'{str(id_protein)[2:-2]} {weight_protein:6n} {sequence_protein}')
protein_fragments = SeqIO.parse('short_protein_fragments.fasta', 'fasta')
id_fragments = []
sequence_fragments = []
weight_fragments = []
for i in protein_fragments:
sequence_fragments.append(f'{i.seq}')
id_fragments.append(f'{i.id}')
weight = SeqUtils.molecular_weight(i.seq, seq_type="protein", circular=True)
weight_fragments.append(round(weight, 1))
for item_a, item_b, item_c in zip(id_fragments, weight_fragments, sequence_fragments):
print(f'{str(item_a)} {item_b:6n} {item_c}')
import itertools
from math import isclose
combinations = []
for i in range(len(weight_fragments)):
for weight_subset in itertools.combinations(weight_fragments, i):
if isclose(sum(weight_subset), weight_protein):
combinations.append(weight_subset)
print(combinations)
data_fragments = dict(zip(weight_fragments, sequence_fragments))
print(data_fragments)
weight_combinations = [tuple(data_fragments[i] for i in c) for c in combinations]
print(weight_combinations)
import itertools
used_fragments = []
for el in sequence_fragments:
if el in sequence_protein:
used_fragments.append(el)
sequence_combinations = []
for i in range(0, len(used_fragments)+1):
for seq_subset in itertools.permutations(used_fragments, i):
if (''.join(seq_subset)) == (sequence_protein):
sequence_combinations.append(seq_subset)
print(sequence_combinations)
if sorted(weight_combinations) == sorted(sequence_combinations):
print(f'The sequence "{sequence_protein}" with molecular weight {weight_protein}\ncan be covered by the fragments {str(weight_combinations)[1:-1]}\nwith molecular weights {str(combinations)[1:-1]}')
else:
print(f'The computed weight combinations do not cover the protein sequence')
对于以下蛋白质序列和片段:
seq_compl 3788.4 IEEATHMTPCYELHGLRWVQIQDYAINVMQCL
seq_0000 3125.4 SKEPFKTRIDKKPCDHNTEPYMSGGNY
seq_0001 1963.4 KMITKARPGCMHQMGEY
seq_0002 397.5 AINV
seq_0003 484.5 QIQD
seq_0004 1036.3 YAINVMQCL
seq_0005 2267.6 IEEATHMTPCYELHGLRWV
seq_0006 475.6 MQCL
seq_0007 1724 HMTPCYELHGLRWV
seq_0008 2000.2 DHTAQPCRSWPMDYPLT
seq_0009 811.9 IEEATHM
seq_0010 1397.7 MVGKMDMLEQYA
seq_0011 681.8 GWPDII
seq_0012 647.7 QIQDY
seq_0013 2174.4 TPCYELHGLRWVQIQDYA
seq_0014 1794 HGLRWVQIQDYAINV
seq_0015 1040.3 KKKNARKW
seq_0016 1455.7 TPCYELHGLRWV
这给了我一个清单
序列组合:
[('IEEATHMTPCYELHGLRWV', 'QIQD', 'YAINVMQCL'),
('IEEATHMTPCYELHGLRWV', 'QIQDY', 'AINV', 'MQCL'),
('IEEATHM', 'TPCYELHGLRWV', 'QIQD', 'YAINVMQCL'),
('IEEATHM', 'TPCYELHGLRWV', 'QIQDY', 'AINV', 'MQCL')]
和
重量组合:
[('QIQD', 'YAINVMQCL', 'IEEATHMTPCYELHGLRWV'),
('AINV', 'IEEATHMTPCYELHGLRWV', 'MQCL', 'QIQDY'),
('QIQD', 'YAINVMQCL', 'IEEATHM', 'TPCYELHGLRWV'),
('AINV', 'MQCL', 'IEEATHM', 'QIQDY', 'TPCYELHGLRWV')]
它们基本上都包含相同的片段集,但顺序不同。一个是根据可能的权重组合计算的,另一个是针对与给定蛋白质序列相对应的可能序列排列计算的。我试图对两个列表进行排序,以便比较元素,但是两个列表似乎排序不同?
排序(sequence_combinations)
[('IEEATHM', 'TPCYELHGLRWV', 'QIQD', 'YAINVMQCL'),
('IEEATHM', 'TPCYELHGLRWV', 'QIQDY', 'AINV', 'MQCL'),
('IEEATHMTPCYELHGLRWV', 'QIQD', 'YAINVMQCL'),
('IEEATHMTPCYELHGLRWV', 'QIQDY', 'AINV', 'MQCL')]
排序(权重组合)
[('AINV', 'IEEATHMTPCYELHGLRWV', 'MQCL', 'QIQDY'),
('AINV', 'MQCL', 'IEEATHM', 'QIQDY', 'TPCYELHGLRWV'),
('QIQD', 'YAINVMQCL', 'IEEATHM', 'TPCYELHGLRWV'),
('QIQD', 'YAINVMQCL', 'IEEATHMTPCYELHGLRWV')]
有什么办法可以让我验证list_a == list_b中的片段组合吗?
解决方案
推荐阅读
- mongodb - 如何在数组中检索具有相同键的对象值
- c# - Net Core - 为什么异常中间件这么慢?
- javascript - 如何重构这个 NativeScript 简单图像 css 交换器
- php - 根据用户定义的偏好对 PHP 数组进行排序
- php - 错误 #1251 - 无法登录 MySQL 服务器
- excel - 如何根据分隔符提取字符串值
- c# - Visio 2016 - AxMicrosoft.Office.Interop.VisOcx - 在网络上保存时的奇怪行为
- git - “这个分支有必须解决的冲突”,但它已经合并了
- python - 列表理解测试中的变量被视为未在 python3 的 exec 中定义
- django - 如何解决其他应用程序的静态文件无法在 Django 中加载到我的应用程序?