首页 > 解决方案 > 提取 UD 语料库中的特定列

问题描述

我有一个文本文件:

1 This D
2 is   V
3 one  A
4 example
5 .    P

1 This D
2 is   V
3 another 
4 example

我想提取第二列附加到由换行符分隔的列表expected output: ["this is one example", "this is another example"]

with open("data.txt","r") as f:
    print(f.read().split()[1])

但我只得到输出This。我该怎么做??获取 UD 语料库中的特定列(在本例中为句子)。

标签: python

解决方案


f.read()将整个文件作为字符串读取。

from itertools import groupby

with open("data.txt", "r") as f:
    lines = f.read().splitlines()

second_column = [line.split()[1] if line else "\n" for line in lines]

words_list = [list(group) for k, group in groupby(second_column, lambda x: x == "\n") if not k]

sentences = [" ".join(words) for words in words_list]
  • f.read().splitlines()删除换行符并将行拆分为列表。
  • groupby()按 拆分列表\n

推荐阅读