python - 如何将纯文本标题和列表传输到 Python 字典对象?
问题描述
我的问题:
我想将带有标题和列表的纯文本解析为单个Python
对象,其中标题作为dict
键,列表作为list
值。正文如下图所示:
Playing cricket is my hobby:
(a) true.
(b) false.
Furthermore, the heading does not include:
(a) Singlets.
(b) fabrics.
(c) Smocks.
我想要的输出是:
{"Playing cricket is my hobby:":["(a)true.","(b)false."],"Furthermore, the heading does not include:":["(a) Singlets.","(b) Garments.","(c) Smocks."]}
我做了什么
我首先将文本转换为字符串列表:
plaintxtlist=['Playing cricket is my hobby:','(a) true.','(b) false.','Furthermore, the heading does not include:','(a) Singlets.',' (b) fabrics.','(c) Smocks.']
我试图将上面的列表转换成一个字典,它的键是标题和值的索引以及文本列表。这是代码:
import re
data = {} #dictonary
lst = [] #list
regalter=r"^\s*\(([^\)]+)\).*|^\s*\-.*" #regex to identify (a)(A) or - type of lines
j=0
sub = [] #list
plaintxtlist=['Playing cricket is my hobby:','(a) true.','(b) false.','Furthermore, the heading does not include:','(a) Singlets.',' (b) fabrics.','(c) Smocks.']
for i in plaintxtlist: #the data in text files are converted to list of strings and passed to code
if sub:
match = re.match(regalter, i) # pattern matching using regex
if match:
sub.append(i) #if the line containes (a)or(A) it will be appended to list called sub
else:
j=j+1 #each list of lines will have value from 0 n (n is the last line)
sub = [i] #list of text will be appended to list called sub
data[str(j)] = sub # here the sub list will be added to dictonary named data with o,1,2,3 respectively we are laster converting that to string
else:
if sub:
data[str(j)] = sub #else if sub the content in the sublist will be appended to dictonary named data
sub = [i] #each line will be appended to sub list
data[str(j)] = i # if there is no match with regex the pain text will be appended to dictonary
print(data) #print the
以及以下代码的输出:
{"0":["Playing cricket is my hobby:","(a)true.","(b)false."],"1":["Furthermore, the heading does not include:","(a) Singlets.","(b) Garments.","(c) Smocks."]}
解决方案
您不需要首先转移每一行以适合列表。为简单起见,您可以先按 组织原始文本内容regex
,然后将它们解析为dictionary
您想要的。
您可以通过在下一行中指定文本内容位于“句点”之前且不跟“(”)来找出分组关系。
假设文本内容保存在一个名为a_text_file.txt
. 完整的代码在这里:
import re
with open('a_text_file.txt') as f:
s = f.read()
pattern = re.compile(r'[\w\s\().:,]+?\.(?!\n\()')
data = dict()
for m in re.findall(pattern, s):
# Group the raw content by `regex`,
# and fit each line into a list
group = m.strip()
lst = group.split('\n')
# Strip out spaces in `key` and `value`
key = lst[0].strip()
value = [i.strip() for i in lst[1:]]
# Fit into the final output
data.update({key: value})
print(data)
最终输出:
{'Playing cricket is my hobby:': ['(a) true.', '(b) false.'], 'Furthermore, the heading does not include:': ['(a) Singlets.', '(b) fabrics.', '(c) Smocks.']}
推荐阅读
- javascript - 集成 CKEditor API 以编程方式在组件中进行调用
- security - 如何使用 REMOTE_USER 配置 Airflow RBAC UI 安全性
- eclipse-plugin - Eclipse RCP 应用程序升级到最新
- flutter - 错误:“RequestCallBack”类型不是“FirebaseUser”类型的子类型
- javascript - 根据时间值重新排列数组
- mysql - 如何从 Node 容器访问外部数据库?
- azure-table-storage - 无法在 Azure 存储资源管理器中加载存储表
- html - 如何使用 Git Bash 命令提示符打开 Sublime 文本
- c++ - 在 C++ 中使用模板类型名调用函数的段错误
- pandas - 大熊猫中每个时隙的填充值增加