python - Python:将列表列表转换为分层字典
问题描述
我有一些基因测序数据,如下所示:
data = [{'sequence': 'gene1__gene2__gene3', 'occurrence': 10},
{'sequence': 'gene2__gene3', 'occurrence': 5},
{'sequence': 'gene2', 'occurrence': 2},
{'sequence': 'gene4', 'occurrence': 4}
]
我想将其转换为以下(树状)dictionary
数据结构,其中任何子路径都会告诉我该组基因的共现计数:
tree_dict = {
'gene1': {'occurrence': 10, 'self': 0, 'children': {'gene2': {'occurrence': 10, 'self': 0, 'children': {'gene3': {'occurrence': 10, 'self': 10, 'children': {}}}},
'gene3': {'occurrence': 10, 'self': 0, 'children': {'gene2': {'occurrence': 10, 'self': 10, 'children': {}}}},
}
},
'gene2': {'occurrence': 17, 'self': 2, 'children': {'gene1': {'occurrence': 10, 'self': 0, 'children': {'gene3': {'occurrence': 10, 'self': 10, 'children': {}}}},
'gene3': {'occurrence': 15, 'self': 5, 'children': {'gene1': {'occurrence': 10, 'self': 10, 'children': {}}}},
}
},
'gene3': {'occurrence': 15, 'self': 0, 'children': {'gene1': {'occurrence': 10, 'self': 0, 'children': {'gene2': {'occurrence': 10, 'self': 10, 'children': {}}}},
'gene2': {'occurrence': 15, 'self': 5, 'children': {'gene1': {'occurrence': 10, 'self': 10, 'children': {}}}},
}
},
'gene4': {'occurrence': 4, 'self': 4, 'children': {}}
}
在tree_dict
上面:
self
指仅出现在(子)路径中的节点。例如:gene3
永远不会单独存在,因此self
值为 0;whilegene2
单独存在2
时间,因此其self
值为 2。occurrence
指(子)路径中的节点作为子字符串和整体出现。
我试过的代码?
当我知道这个解决方案必须是递归函数时,我正在尝试失败迭代方法。类似于这个问题的东西:如何将列表转换为层次结构 dict。但我无法在这个方向上取得任何进展。
解决方案
试试这个:
data = [{'sequence': 'gene1__gene2__gene3', 'occurrence': 10},
{'sequence': 'gene2__gene3', 'occurrence': 5},
{'sequence': 'gene2', 'occurrence': 2},
{'sequence': 'gene4', 'occurrence': 4}]
tree_dict = {}
def generate_tree(sequence, occurrence, curr_dict):
gene_list = sequence.split('__')
for gene in gene_list:
if gene in curr_dict:
curr_dict[gene]['occurrence'] += occurrence
else:
curr_dict[gene] = {'occurrence': occurrence, 'self': 0, 'children': {}}
updated_list = gene_list.copy()
updated_list.remove(gene)
updated_sequence = '__'.join(updated_list)
if updated_sequence != '':
generate_tree(updated_sequence, occurrence, curr_dict[gene]['children'])
else:
curr_dict[gene]['self'] += occurrence
for item in data:
generate_tree(item['sequence'], item['occurrence'], tree_dict)
print(tree_dict)
推荐阅读
- ansible - 在 Ansible 中,有没有办法在循环范围函数中使用 loop_controls index_var
- html - 如何在 CSS 中设置动画时间?
- winapi - WinAPI Region 没有意义吗?
- python - 两个张量在同一个设备上,但我收到错误:预期所有张量都在同一个设备上,但找到至少两个设备,cuda:0 和 cpu
- java - 使回收站视图可垂直滚动
- ssms - 修改 SQL Server Management Studio 中自动生成的脚本文件的默认文件名格式?
- git - 如何撤消 git stash save --keep-index
- python - 从列表内部分离字典值
- sass - 具有 px 和百分比单位的 SCSS min()
- flutter - 颤振,将“列表”转换为“字符串”时出错