首页 > 解决方案 > spacy : given a string in a doc how to find start and end char indices of the string in the doc

问题描述

import spacy
nlp = spacy.load('en')
doc = nlp('An example sentence in the city of london')
str1 = 'in the city'
str2 = 'example sentence'

i want to find the start and end character indices of all the str (from a list) in the doc. How to do this using spacy ?

what i have done so far : complex for loop matching each character which obviously doesnt scale well .

标签: pythonpython-3.xspacy

解决方案


If you have a Spacy doc instance, the "string in the doc" is an attribute of the doc (see relevant documentation here). Then you can use regular expressions:

import re
doc = nlp('An example sentence in the city of london')
listOfStrings = [ 'in the city' , 'example sentence' ]

for s in listOfStrings:
    res = re.search(s,doc.text)
    if res:
        print(s , res.start(), res.end())

# in the city 20 31
# example sentence 3 19


推荐阅读