首页 > 解决方案 > 如何枚举用SpaCy一一分隔的段落的句子

问题描述

我想阅读一段用 SpaCy 分隔的段落的句子。然而,当我试图列举句子时,我列举的是单词而不是句子。的确,

text = predicted.iloc[0,5]
sentences = spacy_nlp(text)
print(sentences)
for i,sent in enumerate(sentences):
    print("---",i,"---")
    print(sent)

首先给出 SpaCy 的句子:

['Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress.', "Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child.", "Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time.", 'Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".']

但随后它列举了单词而不是句子:

--- 0 ---
[
--- 1 ---
'
--- 2 ---
Beyoncé
--- 3 ---
Giselle
--- 4 ---
Knowles
--- 5 ---
-
--- 6 ---
Carter
...

PS:

感谢 X 将其转换为列表的想法使我能够逐句编写它。

然而,整个想法是让它发挥nltk_spacy_tree()作用,它似乎只接受类型的对象,spacy.tokens.doc.Doc所以我做了以下似乎不太适应的事情。看起来太复杂了:

text = predicted.iloc[0,5]
sentences = list(spacy_nlp(text))
sentences = en_nlp(predicted["context"][0].lower()).sents
#print(type(en_nlp(sentences)))
for i,sent in enumerate(sentences):
    print("---",i,"---")
    print(en_nlp(str(sent)))
    sent = en_nlp(str(sent))
    tree = nltk_spacy_tree(sent)
    print(tree)

标签: pythonspacy

解决方案


推荐阅读