首页 > 解决方案 > 从 python 字典更新文本文件

问题描述

您好社区成员,

假设我在 python 中有一本字典:

dict = {'fresh air', 'entertainment system', 'ice cream', 'milk', 'dog', 'blood pressure'}

以及如下文本列表:

text_file = ['is vitamin d in milk enough', 'try to improve quality level by automatic intake of fresh air', 'turn on the tv or entertainment system based on that individual preferences', 'blood pressure monitor', 'I buy more ice cream', 'proper method to add frozen wild blueberries in ice cream']

我想显示每个出现的短语都属于字典(比如新鲜空气),就像#fresh_air#在文本文件的所有出现中一样,而对于字典的每个单词(比如milk),输出应该显示为#milk#,即在开头附加特殊字符并以所有出现的 text_file 结尾。

我想要的输出应采用以下形式(列表列表):

[[is vitamin d in #milk# enough], [try to improve quality level by automatic intake of #fresh_air#], [turn on the tv or #entertainment_system# based on the individual preferences], [#blood_pressure# monitor], [I buy more #ice_cream#], [proper method to add frozen wild blueberries in #ice_cream# with #milk#]]

是否存在任何标准方法可以以省时的方式实现这一目标?

我是使用 python 进行列表和文本处理的新手,我尝试使用列表理解,但未能达到预期的结果。任何帮助都深表感谢。

标签: pythonpython-3.xlistdictionarynltk

解决方案


使用正则表达式。

前任:

import re
data = {'fresh air', 'entertainment system', 'ice cream', 'milk', 'dog', 'blood pressure'}
pattern = re.compile("("+"|".join(data)+")")
text_file = ['is vitamin d in milk enough', 'try to improve quality level by automatic intake of fresh air', 'turn on the tv or entertainment system based on that individual preferences', 'blood pressure monitor', 'I buy more ice cream', 'proper method to add frozen wild blueberries in ice cream']

result = [pattern.sub(r"#\1#", i) for i in text_file]
print(result)

输出:

['is vitamin d in #milk# enough',
 'try to improve quality level by automatic intake of #fresh air#',
 'turn on the tv or #entertainment system# based on that individual preferences',
 '#blood pressure# monitor',
 'I buy more #ice cream#',
 'proper method to add frozen wild blueberries in #ice cream#']

请注意,您的dict变量是一个set对象。


根据评论中的要求更新了片段。

演示:

import re
data = {'fresh air', 'entertainment system', 'ice cream', 'milk', 'dog', 'blood pressure'}
data = {i: i.replace(" ", "_") for i in data}
#pattern = re.compile("("+"|".join(data)+")")
pattern = re.compile(r"\b("+"|".join(data)+r")\b")
text_file = ['is vitamin d in milk enough', 'try to improve quality level by automatic intake of fresh air', 'turn on the tv or entertainment system based on that individual preferences', 'blood pressure monitor', 'I buy more ice cream', 'proper method to add frozen wild blueberries in ice cream']

result = [pattern.sub(lambda x: "#{}#".format(data[x.group()]), i) for i in text_file]
print(result)

输出:

['is vitamin d in #milk# enough',
 'try to improve quality level by automatic intake of #fresh_air#',
 'turn on the tv or #entertainment_system# based on that individual preferences',
 '#blood_pressure# monitor',
 'I buy more #ice_cream#',
 'proper method to add frozen wild blueberries in #ice_cream#']

推荐阅读