python - spacy : given a string in a doc how to find start and end char indices of the string in the doc
问题描述
import spacy
nlp = spacy.load('en')
doc = nlp('An example sentence in the city of london')
str1 = 'in the city'
str2 = 'example sentence'
i want to find the start and end character indices of all the str
(from a list) in the doc. How to do this using spacy ?
what i have done so far : complex for loop matching each character which obviously doesnt scale well .
解决方案
If you have a Spacy doc instance, the "string in the doc" is an attribute of the doc (see relevant documentation here). Then you can use regular expressions:
import re
doc = nlp('An example sentence in the city of london')
listOfStrings = [ 'in the city' , 'example sentence' ]
for s in listOfStrings:
res = re.search(s,doc.text)
if res:
print(s , res.start(), res.end())
# in the city 20 31
# example sentence 3 19
推荐阅读
- active-directory - 当 memberOf 更改时,用户的 uSNChanged 未更新
- neo4j - 如何使用Spring数据neo4j-RX在一个节点支持多个标签
- sql - SQL - 计算每个客户的开票年数以获得年平均销售额
- clojure - 如何减小 graal 本机图像的文件大小?
- image - 什么是保存从图像分割的文本行的matlab代码
- c# - 尝试调用方法时无法从 void 转换为 System.Action
- python - 如何从具有多级重复列的excel表中取消堆叠df?设置多索引?
- django - django.db.models ImageField 将图像保存为 Base64
- python - 为什么 menubar 方法在以下代码中不起作用?
- c# - 最佳方式提示用户 4 .txt,阅读它,用“,”分隔,迭代 2 在 C# 中用“A”、“a”、相同的 4“Z”、“z”找到第一个 str?