python-3.x - 解析包含在 python 3.x 中以不同顺序排列的不同字段的字符串
问题描述
我有一些记录:
records=['Event: Description of some sort of event, sometimes with a: colon 0 Date: 02/05/2008 Time: 9:30 am Location: Room A Result: Description of result 0',
'Event: Description of event 1 ',
'Event: Description of some sort of event 2 Date: 06/03/2010 Time: 1:30 pm Location: Room b Result: Description of result 2',
'Date: 06/03/2010 Time: 2:30 pm Event: Description of some sort of event 2 Result: Description of result 2 Location: Room b',
'Date: 06/03/2010 Result: Description of result 3']
我(最终)想将它们摄取到熊猫数据框中,但我什至不知道如何将它们解析成有用的列表或字典。我正在做的是:
import re
import pandas as pd
delimeters = ['Event:', 'Date:', 'Time:','Location:', 'Result:']
delimeters = '|'.join(delimeters)
print('without parentheses, I lose my delimeters:')
for record in records:
print(re.split(delimeters, record))
我很好奇为什么这会在每个列表的开头生成一个空项目。但更重要的是我想保留分隔符。
我已经看到了在单个分隔符周围使用括号将其保留在拆分字符串列表中的示例,但这会产生奇怪的结果以及可能的 delmeter 的串联列表。我不明白,例如,为什么添加括号会产生无 - 很想明白这一点!
print('With parentheses things get wierd:')
delimeters = ['(Event:)', '(Date:)', '(Time:)','(Location:)', '(Result:)']
delimeters = '|'.join(delimeters)
for record in records:
print(re.split(delimeters, record))
理想情况下,我会提取以下内容作为解析记录的输出:
{'Event': ['Description of some sort of event, sometimes with a: colon'],
'Date': ['02/05/2008'],
'Time': ['1:30 pm'],
'Location': ['Room b'],
'Result': ['Some description of the result, sometimes with a : colon']} # etc
这将使我能够直接传递给数据框:
pd.DataFrame({'Event': ['Description of some sort of event, sometimes with a: colon'],
'Date': ['02/05/2008'],
'Time': ['1:30 pm'],
'Location': ['Room b'],
'Result': ['Some description of the result, sometimes with a : colon']}
)
非常感谢任何步骤的任何指示或帮助。
解决方案
这是一个不使用正则表达式的解决方案,尽管它确实涉及嵌套循环:
records = ['Event: Description of some sort of event, sometimes with a: colon 0 Date: 02/05/2008 Time: 9:30 am Location: Room A Result: Description of result 0',
'Event: Description of event 1 ',
'Event: Description of some sort of event 2 Date: 06/03/2010 Time: 1:30 pm Location: Room b Result: Description of result 2',
'Date: 06/03/2010 Time: 2:30 pm Event: Description of some sort of event 2 Result: Description of result 2 Location: Room b',
'Date: 06/03/2010 Result: Description of result 3']
delims = ('Event:', 'Date:', 'Time:', 'Location:', 'Result:')
parsed = []
# Iterate records
for record in records:
# An empty dictionary object
d = {}
# Split the record into separate words by spaces
words = record.split(' ')
# Iterate the words in the record
for i in range(len(words)):
# If this word is one of the delimiters
if words[i] in delims:
# Set the key to the delimiter (without a colon)
key = words[i][:-1]
# Increment the loop counter to skip to the next item
i += 1
# Start with a value of an empty list
val = []
# While we are inside the array bounds and the word is not a dilimiter
while i < len(words) and not words[i] in delims:
# Add this word to the value
val.append(words[i])
# Increment the loop counter to skip to the next item
i += 1
# Add the key/value pair to the record dictionary
d[key] = ' '.join(val)
# Append the record dictionary to the results
parsed.append(d)
print(repr(parsed))
想法是将每条记录拆分为单词列表,并检查每个记录是否是分隔符,如果是则将其设置为键,如果不是则将单词添加到值中。
输出(漂亮打印):
[{'Date': '02/05/2008',
'Event': 'Description of some sort of event, sometimes with a: colon 0',
'Location': 'Room A',
'Result': 'Description of result 0',
'Time': '9:30 am'},
{'Event': 'Description of event 1 '},
{'Date': '06/03/2010',
'Event': 'Description of some sort of event 2',
'Location': 'Room b',
'Result': 'Description of result 2',
'Time': '1:30 pm'},
{'Date': '06/03/2010',
'Event': 'Description of some sort of event 2',
'Location': 'Room b',
'Result': 'Description of result 2',
'Time': '2:30 pm '},
{'Date': '06/03/2010', 'Result': 'Description of result 3'}]
推荐阅读
- amazon-web-services - Terraform 创建安全组权限
- php - 你能帮我吗,我不明白
- coq - 如何使 subst 在 Coq 中保持最漂亮的名称(按字典顺序至少一个)?
- constants - Julia 中的任意精度浮点类型
- python - 为folium choropleth中的缺失值设置颜色
- drupal - 设置中的预览CKEditor工具栏消失(Drupal 7)
- sql - 查询超出 Bigquery 中的资源限制
- mysql - 我应该在这个查询中使用什么来替换 GROUP BY?
- select - 在 Dropdown 元素内实现 Select 下拉菜单
- html - 我应该把我的 lang 属性放在哪里?