python - 缩写词和连字符的打印
问题描述
我需要识别句子中的所有缩写词和连字符才能开始。当它们被识别时,它们需要被打印出来。我的代码似乎不能很好地进行这种识别。
import re
sentence_stream2=df1['Open End Text']
for sent in sentence_stream2:
abbs_ = re.findall(r'(?:[A-Z]\.)+', sent) #abbreviations
hypns_= re.findall(r'\w+(?:-\w+)*', sent) #hyphenated words
print("new sentence:")
print(sent)
print(abbs_)
print(hypns_)
我的语料库中的一句话是:DevOps with APIs & event-driven architecture using cloud Data Analytics environment Self-service BI
输出是:
new sentence:
DevOps with APIs & event-driven architecture using cloud Data Analytics environment Self-service BI
[]
['DevOps', 'with', 'APIs', 'event-driven', 'architecture', 'using', 'cloud', 'Data', 'Analytics', 'environment', 'Self-service', 'BI']
预期输出是:
new sentence:
DevOps with APIs & event-driven architecture using cloud Data Analytics environment Self-service BI
['APIs','BI']
['event-driven','Self-service']
解决方案
您的缩写规则不匹配。你想找到任何超过 1 个连续大写字母的单词,你可以使用的规则是:
abbs_ = re.findall(r'(?:[A-Z]{2,}s?\.?)', sent) #abbreviations
这将匹配 API 和 BI。
t = "DevOps with APIs & event-driven architecture using cloud Data Analytics environment Self-service BI"
import re
abbs_ = re.findall(r'(?:[A-Z]\.)+', t) #abbreviations
cap_ = re.findall(r'(?:[A-Z]{2,}s?\.?)', t) #abbreviations
hypns_= re.findall(r'\w+-\w+', t) #hyphenated words fixed
print("new sentence:")
print(t)
print(abbs_)
print(cap_)
print(hypns_)
输出:
DevOps with APIs & event-driven architecture using cloud Data Analytics environment Self-service BI
[] # your abbreviation rule - does not find any capital letter followed by .
['APIs', 'BI'] # cap_ rule
['event-driven', 'Self-service'] # fixed hyphen rule
这很可能找不到所有缩写,例如
t = "Prof. Dr. S. Quakernack"
所以您可能需要使用更多数据和 fe http://www.regex101.com对其进行调整
推荐阅读
- python - 如何为不同的分类列创建带有编码的管道?
- c - 如何使用 strtok 仅获取括号中的术语
- c - 删除 `else` 条件突然使代码工作,什么给出?
- python - 如何将值附加到多维 numpy 数组?
- javascript - JavaScript - 有效地创建具有位数的二进制字符串数组
- python - 新的 TensorFlow 2.4 GPU 问题
- python - 我可以从 Python 中的后台线程渲染 openCV 动画吗?
- html - 如何使用 html 下载文件而不是在 Google 云存储中查看
- php - 在 foreach 中获得通知
- html - 如何将数据从节点 js 传递到 html?