首页 > 解决方案 > SUMY Text Summarizer 无法汇总并返回原始文本

问题描述

LANGUAGE = "english"
stemmer = Stemmer(LANGUAGE)

def get_luhn_summary(text):
        summ = list()
    
        parser = PlaintextParser.from_string(text, Tokenizer(LANGUAGE))
        summarizer = LuhnSummarizer()
        summarizer.stop_words = get_stop_words(LANGUAGE)
    
        for sentence in summarizer(parser.document,10):
            summ.append(str(sentence))
        return summ

summaryA_luhn = get_luhn_summary(textA)

始终返回原始字符串。我很困惑,因为我正在关注文档

标签: pythontextsummarization

解决方案


总结是通过句子计数来完成的。

import nltk
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.luhn import LuhnSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words

LANGUAGE = "english"
SENTENCES_COUNT = 2
nltk.download('punkt')

parser = PlaintextParser.from_file("document.txt", Tokenizer(LANGUAGE))

stemmer = Stemmer(LANGUAGE)

summarizer = Summarizer(stemmer)
summarizer.stop_words = get_stop_words(LANGUAGE)

for sentence in summarizer(parser.document, SENTENCES_COUNT):
    print(sentence)

以下将从文件名 document.txt 中读取句子,并根据 SENTENCES_COUNT 将根据您指定的句子数进行汇总。

因此,如果 document.txt 有 10 个句子,并且您设置 SENTENCES_COUNT = 2,您将得到两个句子的摘要。

您也可以简单地换出:

parser = PlaintextParser.from_file("document.txt", Tokenizer(LANGUAGE))

和:

text = "This is the string to parse. Hopefully it will be more than one sentence. Like so!"    
parser = PlaintextParser.from_string(text, Tokenizer(LANGUAGE))

如果您要从字符串而不是文件中解析什么。


推荐阅读