rdkit - 如何解释从 Chem.RDKFingerprint(mol) 获得的特征
问题描述
我已完成以下操作以从 mol 文件中获取指纹。通过转换 fp.ToBitString() 给我一个长度为 2048 的向量。当我数数时,1 与分子中的原子数相同。我们如何解释这个向量?任何有关解释链接的建议都会很棒。
mol = Chem.MolFromSmiles(ms)
fp = Chem.RDKFingerprint(mol)
fp.ToBitString()
这是我得到的向量
'00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000'
解决方案
据我所知,这RDKFingerprint
是一个“类日光”子结构指纹,它使用一个位向量,其中每个位都是由分子中存在的特定子结构设置的。默认设置 ( maxPath
default=7) 考虑最多 7 个键长的子结构。由于没有预定义的子结构集,因此不可能为每个现有模式设置一个位,因此每个键都被视为伪随机数生成器(“散列”)的种子。其输出是一组位 ( nBitsPerHash
, default=2),数字介于 0 和fpSize
default=2048 之间,用于设置指纹中的相应位。
RDKit 有一个很好的工具来解释位集:
from rdkit.Chem import Draw
from rdkit import Chem
smiles = 'OC(CN1C=NC=N1)(CN1C=NC=N1)C1=C(F)C=C(F)C=C1'
mol = Chem.MolFromSmiles(smiles)
bit_info = {}
fp = Chem.RDKFingerprint(mol, maxPath=5, bitInfo=bit_info)
print(list(fp.GetOnBits())[:10]) # print the first 10 bits set to 1
# using the bit_info dictionary populated by RDKit prepare a visualisation
Draw.DrawRDKitBit(mol, 60, bit_info)
# draw multiple bits (12)
tpls = [(mol, x, bit_info) for x in bit_info]
Draw.DrawRDKitBits(tpls[:12], molsPerRow=4, legends=[str(x) for x in bit_info][:12])
一些推荐阅读:
推荐阅读
- windows - 在 Windows Batch 中迭代键值对
- ruby-on-rails - Ruby 中的 Phusion Passenger 独立 Web 服务器 - Gem 加载错误
- java - hibernate + jersey2 + Json + blob 图像
- javascript - 移除 HTML 元素周围的奇怪边框
- python - conda 相当于 pip install
- python - 授权失败,因为服务器遇到了阻止其完成请求的意外情况
- cloudflare - 即使添加 cf 名称服务器,Cloudflare 也不会缓存
- c# - 如何自动生成具有 3 到 16 个顶点的原始网格?
- html - 如何在哈巴狗循环中回显一些 HTML?
- webpack - 在 Laravel Mix 中使用提取文本 Webpack 插件