python - 如何根据其中一个子字符串对字符串进行分组?
问题描述
我有以下清单jargs
。
jargs = ['10192393\t15\t26\tskin tumour\tDiseaseClass\tD012878',
'10192393\t443\t449\tcancer\tDiseaseClass\tD009369',
'10192393\t483\t496\tcolon cancers\tDiseaseClass\tD003110',
'10194428\t30\t45\themochromatosis\tModifier\tD016399',
'10194428\t102\t117\themochromatosis\tSpecificDisease\tD006432',
'10194428\t119\t145\tHereditary hemochromatosis\tSpecificDisease\tD006432',
'10194428\t147\t149\tHH\tDiseaseClass\tD006432']
我想编写一个输出以下内容的程序:
ents =
[
'10192393', {"entities":[(15, 26,"DiseaseClass"), (443, 449, "DiseaseClass"), (483, 496, "DiseaseClass")]},
'10194428', {"entities": [(30, 45, "Modifier"), (102, 117, "SpecificDisease"), (119, 145, "SpecificDisease"), (147, 149, "DiseaseClass")]}
]
我尝试了以下方法:
ents = [list(set([jargs[i].split('\t')[0] for i in range(len(jargs))]))[0],\
{"entities": [(int(jargs[i].split('\t')[1]), int(jargs[i].split('\t')[2]),\
jargs[i].split('\t')[-2]) for i in range(len(jargs))]}]
不幸的是,此代码输出以下内容
['10194428',
{'entities': [('15', '26', 'DiseaseClass'),
('443', '449', 'DiseaseClass'),
('483', '496', 'DiseaseClass'),
('30', '45', 'Modifier'),
('102', '117', 'SpecificDisease'),
('119', '145', 'SpecificDisease'),
('147', '149', 'DiseaseClass')]}]
这不是预期的输出。
解决方案
from pprint import pprint
tmp = {}
for item in jargs:
id_, v1, v2, _, v3, *_ = item.split("\t")
tmp.setdefault(id_, []).append((v1, v2, v3))
ents = []
for k, v in tmp.items():
ents.append(k)
ents.append({"entities": v})
pprint(ents)
印刷:
['10192393',
{'entities': [('15', '26', 'DiseaseClass'),
('443', '449', 'DiseaseClass'),
('483', '496', 'DiseaseClass')]},
'10194428',
{'entities': [('30', '45', 'Modifier'),
('102', '117', 'SpecificDisease'),
('119', '145', 'SpecificDisease'),
('147', '149', 'DiseaseClass')]}]
推荐阅读
- javascript - 通过 ID discord.js 添加对消息的反应
- javascript - Jqgrid 不显示复选框列
- python - 视频捕获窗口未关闭 - OpenCV
- node.js - 如何处理对象数组并保存在mongodb中
- processing - 处理 - 将填充颜色与鼠标移动混合
- c++ - 将向量(或其他任何东西)从外部移动到类成员的正确方法是什么?
- javascript - 当我在页面上滚动时,我的图像会跳动
- node.js - Nodemail - sendMail 无法读取未定义的属性“sendMail”
- vue.js - e.$OneSignal.on 不是 Nuxtjs PWA 的函数 - OneSignal
- django - django 的 AWS redis 服务器配置