首页 > 解决方案 > 如何在 Python 中比较每一行并获取最后一个完整的句子

问题描述

我有一个包含以下内容的文件。

BEFORE
BEFORE THE
BEFORE THE PARLIAMENT
BEFORE THE PARLIAMENT ON
BEFORE THE PARLIAMENT ON BRITAIN'S
BEFORE THE PARLIAMENT ON BRITAIN'S RELATIONS
BEFORE THE PARLIAMENT ON BRITAIN'S RELATIONS WITH
BEFORE THE PARLIAMENT ON BRITAIN'S RELATIONS WITH SCOTLAND
BRITAIN'S RELATIONS WITH SCOTLAND FOLLOWING
BRITAIN'S RELATIONS WITH SCOTLAND FOLLOWING THE
BRITAIN'S RELATIONS WITH SCOTLAND FOLLOWING THE REFERENDUM
SCOTLAND FOLLOWING THE REFERENDUM VOTE.
SCOTLAND FOLLOWING THE REFERENDUM VOTE. LAST
SCOTLAND FOLLOWING THE REFERENDUM VOTE. LAST MONTH
SCOTLAND FOLLOWING THE REFERENDUM VOTE. LAST MONTH SCOTLAND
REFERENDUM VOTE. LAST MONTH SCOTLAND VOTED
REFERENDUM VOTE. LAST MONTH SCOTLAND VOTED IN
REFERENDUM VOTE. LAST MONTH SCOTLAND VOTED IN FAVOR
REFERENDUM VOTE. LAST MONTH SCOTLAND VOTED IN FAVOR OF
REFERENDUM VOTE. LAST MONTH SCOTLAND VOTED IN FAVOR OF STAYING
REFERENDUM VOTE. LAST MONTH SCOTLAND VOTED IN FAVOR OF STAYING WITH
LAST MONTH SCOTLAND VOTED IN FAVOR OF STAYING WITH THE
LAST MONTH SCOTLAND VOTED IN FAVOR OF STAYING WITH THE UNITED
LAST MONTH SCOTLAND VOTED IN FAVOR OF STAYING WITH THE UNITED KINGDOM
LAST MONTH SCOTLAND VOTED IN FAVOR OF STAYING WITH THE UNITED KINGDOM AFTER
LAST MONTH SCOTLAND VOTED IN FAVOR OF STAYING WITH THE UNITED KINGDOM AFTER THE

我试图忽略重复的,只得到最后一个完整的句子。所以它看起来像这样

BEFORE THE PARLIAMENT ON BRITAIN'S RELATIONS WITH SCOTLAND
BRITAIN'S RELATIONS WITH SCOTLAND FOLLOWING THE REFERENDUM
SCOTLAND FOLLOWING THE REFERENDUM VOTE. LAST MONTH SCOTLAND
REFERENDUM VOTE. LAST MONTH SCOTLAND VOTED IN FAVOR OF STAYING WITH
LAST MONTH SCOTLAND VOTED IN FAVOR OF STAYING WITH THE UNITED KINGDOM AFTER THE

我正在查看上一行是否在下一行,如果是,我想继续迭代,如果不是,我想将最后一句话添加到列表中。但是,我下面的逻辑不起作用。

with open("data.txt", 'r') as f:
    data = f.read()
    data_list = []
    comp_word = "BEFORE"
    for line in data:
        if comp_word in line:
            comp_word == line
        elif comp_word not in line:
            data_list.append(line)

print(data_list)

解决此问题的替代方法是什么?

标签: pythonpython-3.xpython-2.7

解决方案


data = []
with open("data.txt") as infile:
    cache = ''
    for line in infile:
        line = line.strip()
        # if the current line is an extention of the last line, update and ignore
        if line.startswith(cache):
            cache = line
        else:
            # we see a brand new content line. Write out the cache and reset it to the current line's contents
            data.append(cache)
            cache = line
    data.append(line)

推荐阅读