python - 从多个 csv 文件的多列创建字典中的字符偏移列表
问题描述
我有两个类似的 csv 文件,如下所示:
{http://www.omg.org/XMI}id,begin,end,Character
45440,34,45,Miss Parker
45455,137,147,Farrington
48976,295,298,Mr Alleyne
45533,890,900,Mr Alleyne
49020,2147,2154,Mr Alleyne
49020,2147,2154,Mr Alleyne
48606,2689,2696,Farrington
46858,3690,3693,Farrington
48680,5280,5291,clients
46880,5373,5376,Farrington
46728,5396,5407,clients
49057,5673,5683,clients
48734,6145,6155,Mr Alleyne
48734,6145,6155,Mr Alleyne
46699,6661,6664,Miss Delacour
49094,6969,6972,Farrington
48841,8451,8461,Mr Alleyne
48849,8466,8479,Miss Delacour
我希望能够创建一个唯一字符提及的字典作为键,并添加它们的偏移量'begin'
和'end'
,忽略列'{http://www.omg.org/XMI}id'
相应唯一字符(即键)的列。
我想要的输出应该是这样的:
print(dict_of_mentions)
输出:
{'Farrington': [(137,147),(2689,2696) #etc...],
'Mr Alleyne': [(295,298), (890,900) #etc...], #rest of characters... }
到目前为止,我的代码如下所示:
import tkinter
from tkinter import filedialog
def character_mentions():
filenames = filedialog.askopenfilenames()
for filename in filenames:
reader = csv.DictReader(open(filename))
dict_of_mentions = {}
for row in reader:
key = row.pop('Character')
if key in dict_of_mentions:
#implement duplicate row handling here
pass
dict_of_mentions[key] = row
print(dict_of_mentions)
输出如下所示:
{'Miss Parker': OrderedDict([('{http://www.omg.org/XMI}id', '45440'), ('begin', '34'), ('end', '45')]) 'Farrington': OrderedDict([('{http://www.omg.org/XMI}id', '46645'), ('begin', '22012'), ('end', '22014')]), 'Mr Alleyne': OrderedDict([('{http://www.omg.org/XMI}id', '47297'), ('begin', '13952'), ('end', '13962')]), 'clients': OrderedDict([('{http://www.omg.org/XMI}id', '49057'), ('begin', '5673'), ('end', '5683')]), 'Miss Delacour': OrderedDict([('{http://www.omg.org/XMI}id', '45867'), ('begin', '9101'), ('end', '9109')]), 'Everyone': OrderedDict([('{http://www.omg.org/XMI}id', '45836'), ('begin', '11896'), ('end', '11900')]), "Terry Kelly's clerk": OrderedDict([('{http://www.omg.org/XMI}id', '49278'), ('begin', '11971'), ('end', '11980')]), 'crowd': OrderedDict([('{http://www.omg.org/XMI}id', '49337'), ('begin', '12458'), ('end', '12471')]), 'office-girls': OrderedDict([('{http://www.omg.org/XMI}id', '49359'), ('begin', '12537'), ('end', '12549')]), 'Higgins': OrderedDict([('{http://www.omg.org/XMI}id', '45936'), ('begin', '13925'), ('end', '13927')]), 'friends': OrderedDict([('{http://www.omg.org/XMI}id', '49592'), ('begin', '17499'), ('end', '17506')]), 'boys': OrderedDict([('{http://www.omg.org/XMI}id', '47949'), ('begin', '17638'), ('end', '17649')]), 'one of the young women': OrderedDict([('{http://www.omg.org/XMI}id', '46257'), ('begin', '19945'), ('end', '19954')]), 'Weathers': OrderedDict([('{http://www.omg.org/XMI}id', '49643'), ('begin', '19881'), ('end', '19891')]), 'curate': OrderedDict([('{http://www.omg.org/XMI}id', '46142'), ('begin', '19094'), ('end', '19101')]), 'Ada': OrderedDict([('{http://www.omg.org/XMI}id', '46364'), ('begin', '20313'), ('end', '20316')]), 'Tom': OrderedDict([('{http://www.omg.org/XMI}id', '49804'), ('begin', '21852'), ('end', '21855')])}
任何形式的帮助表示赞赏!
解决方案
您可以使用itertools.groupby
>>> import csv
>>> from itertools import groupby
>>> l = list(csv.reader(open('file.csv')))
>>> f = lambda x: x[-1]
>>> {k:[tuple(x[1:3]) for x in v] for k,v in groupby(sorted(l[1:], key=f), f)}
{'Farrington': [('137', '147'), ('2689', '2696'), ('3690', '3693'), ('5373', '5376'), ('6969', '6972')], 'Miss Delacour': [('6661', '6664'), ('8466', '8479')], 'Miss Parker': [('34', '45')], 'Mr Alleyne': [('295', '298'), ('890', '900'), ('2147', '2154'), ('2147', '2154'), ('6145', '6155'), ('6145', '6155'), ('8451', '8461')], 'clients': [('5280', '5291'), ('5396', '5407'), ('5673', '5683')]}
推荐阅读
- python - 输入的错误处理:int 元素列表
- javascript - javascript 的 parseInt 方法
- html - 样式表必须是内联的
- html - 如何用 html 和 css 设计形状
- java - 霍夫曼编码解码器 - 发送编码器字典
- shell - 如何生成用于 coinbase 身份验证的承载代码?
- python - 我想在 if else 没有任何循环的情况下停止执行。如何在不使用函数的情况下做到这一点?
- c - 如何使用线程修复“积极循环优化”错误?
- android - 将单选按钮添加到表 android studio
- android - 关于在 react native 上将经纬度从地理位置发送到 react-native-firebase 的问题