python - 在另一个文件的 kmers 中搜索一个文件的 kmers 并在 Python 中计算出现次数
问题描述
得到了这个函数,它会在 python 的四个 Bases 上生成所有可能的 kmers:
def generate_kmers(k):
bases = ['A', 'C', 'T', 'G'] # in task (a) we only should wirte a function that generates k-mers of the four Bases
kmer = [''.join(p) for p in itertools.product(bases, repeat=length_kmer)]
# itertools.product returns a Cartesian product of input iterables, in our case it generates over bases and joined
# all string combinations together over a length of k-mers
return kmer
now what I want is, to look over a list of Sequences of a fastq file (eg ['GTATACACTAGTCCAGGATGTGCTTCTTGTAGAAAAGTAAAACAATGGTTAAAAGATCACAATCTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN', 'CCTGTAGAGTCATAAAGACCTCTTGGGTCCATCCTAGAAATTTTTCAGCTGAGAATAACGGGTCTGTTTCAGTTATTGCTTCTACTATNNNNNNNNNNNNNNNNNNNNNNNNNNN']) and count the occurences of all my kmers of the function generate_kmer in my list of Sequences and to save it在字典里。(例如{AAAA: 2, AAAC: 1...})首先我尝试修改generate_kmer,以便它提供序列文件的所有k-mers,并遍历kmerSequences 和kmerBases,但这不起作用。
有人对我该怎么做有任何想法吗?
解决方案
You could try this with count
:
import itertools
def generate_kmers(k):
bases = ['A', 'C', 'T', 'G'] # in task (a) we only should wirte a function that generates k-mers of the four Bases
kmer = [''.join(p) for p in itertools.product(bases, repeat=k)]
# itertools.product returns a Cartesian product of input iterables, in our case it generates over bases and joined
# all string combinations together over a length of k-mers
return kmer
seqs=['GTATACACTAGTCCAGGATGTGCTTCTTGTAGAAAAGTAAAACAATGGTTAAAAGATCACAATCTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN', 'CCTGTAGAGTCATAAAGACCTCTTGGGTCCATCCTAGAAATTTTTCAGCTGAGAATAACGGGTCTGTTTCAGTTATTGCTTCTACTATNNNNNNNNNNNNNNNNNNNNNNNNNNN']
k=4
mers4= generate_kmers(k)
dcts=[{kmer:seq.count(kmer) for kmer in mers4}for seq in seqs]
print(dcts)
Edit:
import itertools
import re
def generate_kmers(k):
bases = ['A', 'C', 'T', 'G'] # in task (a) we only should wirte a function that generates k-mers of the four Bases
kmer = [''.join(p) for p in itertools.product(bases, repeat=k)]
# itertools.product returns a Cartesian product of input iterables, in our case it generates over bases and joined
# all string combinations together over a length of k-mers
return kmer
k=4
mers4= generate_kmers(k)
#given sequence
s='GTATACACTAGTCCAGGATGTGCTTCTTGTAGAAAAGTAAAACAATGGTTAAAAGATCACAATCTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN'
#function that returns the dictionary with ocurrences
def dct_count(seq):
return {mer:len(re.findall(mer, s)) for mer in mers4}
dc=dct_count(s)
print(dc)
推荐阅读
- firebase - Firestore 返回列表
而不是列表 (扑) - html - 使用 T-SQL 编码的 HTML 代码可以很好地用于查看报告功能,但不能用于查看 pdf,有什么办法可以解决这个问题?
- python - 产品是否适合纸箱?
- adobe-illustrator - 在 illustrator 中从矢量保存高分辨率 png
- ios - КММ 自定义 iOS 框架 - 无法构建模块
- php - 如何解决 Laravel 8 中无法联系 LDAP 服务器的问题?
- numerical-methods - 光谱法的收敛
- git - 该库未通过 git 子树添加到 git
- firebase - Flutter forEach() - 错误:此表达式的类型为“void”,因此无法使用其值
- c++ - 如何使用 select() 为套接字设置计时器?