首页 > 解决方案 > 在文章不存在标签或预期输出摘要的情况下,使用 BERT 进行文章摘要

问题描述

我正在做一个项目,我有一些限制,我不能使用提取方法来总结一篇文章,并且必须为此使用 BERT。如果这是一个标签问题(总结推文、评论、问题),我有训练数据的相应标签,我会使用来自 BERT 的向量作为Keras嵌入层的输入,LSTM并构建一个带有输入的模型和输出标签。但问题是我必须总结文本而不是那些标记的推文和评论。当我有与词汇表相对应的向量时,有什么方法可以使用 BERT(我确定是因为我被特别问过)吗?

标签: pythonmachine-learningnlpartificial-intelligence

解决方案


你有很多想要自动总结的文档,但是你没有任何训练数据。我假设你的文件是英文的。幸运的是,BERT 是一个预训练模型,甚至还有专门用于汇总且非常易于使用的库。如果其中之一满足您的目标,您是否尝试过?例如bert-extractive-summarizer

from summarizer import Summarizer

body = ''' Indian Bank is an Indian state-owned financial services company established in 1907 and headquartered in Chennai, India. 
It has 20,924 employees, 2900 branches with 2861 ATMs and 1014 cash deposit machines and is one of the top performing public sector banks in India. 
Total business of the bank has touched ₹430,000 crore (US$60 billion) as on 31 March 2019. Bank's Information Systems & Security processes certified with ISO27001:2013 standard and is among very few Banks certified worldwide. 
It has overseas branches in Colombo and Singapore including a Foreign Currency Banking Unit at Colombo and Jaffna. It has 227 Overseas Correspondent banks in 75 countries.
Since 1969, the Government of India has owned the bank. As per the announcement made by the Indian Finance Minister Nirmala Sitharaman on 30 August 2019, Indian Bank will be anchor bank for the Kolkata-based Allahabad Bank, and this merger is expected to come on force from 1 April 2020, making it the seventh largest bank in the country. '''


model = Summarizer()
result = model(body, min_length=60)
full = ''.join(result)
print(full)

输出:

印度银行是一家印度国有金融服务公司,成立于 1907 年,总部位于印度钦奈。截至 2019 年 3 月 31 日,该银行的总业务已达到 430,000 千万卢比(600 亿美元)。


推荐阅读