python - 在段落中查找字典值,如果段落没有字典值,则返回 NA
问题描述
假设我有这个随机的段落单词作为列表:
t = ['protein and carbohydrates Its is a little heavier pulsus widely used and is a versatile ingredient',
'Tea contains the goodness of Natural Ingredients Cardamom Ginger Tea bags Disclaimers As per Ayurvedic texts',
'almonds are all natural supreme sized nuts they are highly nutritious and extremely healthy',
'Camel milk can be consumed by lactose intolerant people and those allergic to cows milk',
'Healthy Crunch Almond with honey is an extra crunchy breakfast cereal for a delightful start to your mornings']
字典为
d = {'First': ['Tea','Coffee'],
'Second': ['Noodles','Pasta'],
'Third': ['sandwich','honey'],
'Fourth': ['Almond','apricot','blueberry']
}
我写的代码很慢,而且我想为与任何文本不匹配的段落显示“NA”
代码
get_labels = []
get_text = []
for txt in t:
for dictrow in d.values():
for i in dictrow:
for j in txt.split():
if i == j:
print(j)
print(txt)
get_labels.append(j)
get_text.append(txt)
pd.DataFrame(list(zip(get_text,get_labels)),columns=["whole_text","matched_text"])
最后创建数据框输出后是:
whole_text matched_text
0 Tea contains the goodness of Natural Ingredie... Tea
1 Tea contains the goodness of Natural Ingredie... Tea
2 Healthy Crunch Almond with honey is an extra ... honey
3 Healthy Crunch Almond with honey is an extra ... Almond
但我想要的输出是:
whole_text matched_text
0 protein and carbohydrates Its is a little .... NA
1 Tea contains the goodness of Natural Ingredie... Tea
2 Tea contains the goodness of Natural Ingredie... Tea
3 almonds are all natural supreme sized nuts th... NA
4 Camel milk can be consumed by lactose intoler... NA
2 Healthy Crunch Almond with honey is an extra ... honey
3 Healthy Crunch Almond with honey is an extra ... Almond
我有 2 个问题:
a) 我想为与上表中的任何文本字典值不匹配的段落添加“NA”。
b)我如何优化此代码以更快地运行它,因为我在大型数据集上使用它
解决方案
与set
交叉电源:
paragraphs = ['protein and carbohydrates Its is a little heavier pulsus widely used and is a versatile ingredient',
'Tea contains the goodness of Natural Ingredients Cardamom Ginger Tea bags Disclaimers As per Ayurvedic texts',
'almonds are all natural supreme sized nuts they are highly nutritious and extremely healthy',
'Camel milk can be consumed by lactose intolerant people and those allergic to cows milk',
'Healthy Crunch Almond with honey is an extra crunchy breakfast cereal for a delightful start to your mornings']
d = {'First': ['Tea', 'Coffee'],
'Second': ['Noodles', 'Pasta'],
'Third': ['sandwich', 'honey'],
'Fourth': ['Almond', 'apricot','blueberry']
}
words = set(w for lst in d.values() for w in lst)
match_stats = {'whole_text': [], 'matched_text': []}
for p in paragraphs:
common_words = set(p.split()) & words
if not common_words:
match_stats['whole_text'].append(p)
match_stats['matched_text'].append('NA')
else:
for w in common_words:
match_stats['whole_text'].append(p)
match_stats['matched_text'].append(w)
df = pd.DataFrame(match_stats)
print(df)
输出:
whole_text matched_text
0 protein and carbohydrates Its is a little heav... NA
1 Tea contains the goodness of Natural Ingredie... Tea
2 almonds are all natural supreme sized nuts the... NA
3 Camel milk can be consumed by lactose intolera... NA
4 Healthy Crunch Almond with honey is an extra ... honey
5 Healthy Crunch Almond with honey is an extra ... Almond
推荐阅读
- angular - 使用 Angular 将图像上传到 Django Rest Framework 编码错误
- c# - Newtonsoft.Json.Linq.JArray.Parse(string)' 有一些无效参数
- c - 如何使用 node-rpio 在 Raspberry pi 4 上访问 /dev/mem
- visual-studio-code - 错误: - 未定义引用 `_imp__GetStockObject@4' 和未定义引用 `_imp__SetBkMode@8'
- ios - 如何在较新的 Xcode 上安装较旧的 iOS 运行时?
- arrays - 按可选属性排序的 Swift 数组
- laravel - Laravel Model Eloquent 将结果列表更改为分组数据
- apache-kafka - 用 KStream 语义重组
- regex - 如何替换sed中匹配字符串中所有出现的字符
- vue.js - Vue.js:在加载 SVG 文件期间不显示 ID