首页 > 解决方案 > 将列表数据转换为 CoNLL 2003 NER 格式并将其保存在文本文件中

问题描述

我有一个列表格式的 NER 数据。

样本数据:

[[('Silica', 'NN', '_', 'B-Material'),
  ('nanoparticles', 'NNS', '_', 'I-Material'),
  ('possessing', 'VBG', '_', 'O'),
  ('three', 'CD', '_', 'B-Data'),
  ('different', 'JJ', '_', 'I-Data'),
  ('diameters', 'NNS', '_', 'I-Data'),
  ('(', '(', '_', 'I-Data'),
  ('23', 'CD', '_', 'I-Data'),
  (',', ',', '_', 'I-Data'),
  ('74', 'CD', '_', 'I-Data'),
  ('and', 'CC', '_', 'I-Data'),
  ('170', 'CD', '_', 'I-Data'),
  ('nm', 'NN', '_', 'I-Data'),
  (')', ')', '_', 'I-Data'),
  ('were', 'VBD', '_', 'O'),
  ('used', 'VBN', '_', 'O'),
  ('to', 'TO', '_', 'O'),
  ('modify', 'NN', '_', 'B-Process'),
  ('a', 'DT', '_', 'B-Material'),
  ('piperidine', 'NN', '_', 'I-Material'),
  ('-', ':', '_', 'I-Material'),
  ('cured', 'VBN', '_', 'I-Material'),
  ('epoxy', 'NN', '_', 'I-Material'),
  ('polymer', 'NN', '_', 'I-Material'),
  ('.', '.', '_', 'O')],
 [('Fracture', 'NN', '_', 'B-Process'),
  ('tests', 'NNS', '_', 'I-Process'),
  ('were', 'VBD', '_', 'O'),
  ('performed', 'VBN', '_', 'B-Process'),
  ('and', 'CC', '_', 'O'),
  ('values', 'NNS', '_', 'B-Data'),
  ('of', 'IN', '_', 'I-Data'),
  ('the', 'DT', '_', 'I-Data'),
  ('toughness', 'NN', '_', 'I-Data'),
  ('increased', 'VBN', '_', 'B-Process'),
  ('steadily', 'RB', '_', 'I-Process'),
  ('as', 'IN', '_', 'O'),
  ('the', 'DT', '_', 'B-Data'),
  ('concentration', 'NN', '_', 'I-Data'),
  ('of', 'IN', '_', 'O'),
  ('silica', 'NN', '_', 'B-Material'),
  ('nanoparticles', 'NNS', '_', 'I-Material'),
  ('was', 'VBD', '_', 'O'),
  ('increased', 'VBN', '_', 'B-Process'),
  ('.', '.', '_', 'O')]]

我需要将其转换为 CoNLL-2003 NER 数据格式并将其保存在文本文件中。我实现的代码没有按预期工作。我的实现:

name= 'coll2003_train_com.txt'
def data_format(name, seq):
    test = []
    for i in seq:
        for j in i:
            test.append(j)
    with open(name, 'w', encoding="utf-8") as f1:
        for i in test:
            ii='\t'.join(i)
            f1.writelines(ii + '/n')
            #f1.writelines('/n')
    return test

m=data_format(name, cc1)

结果以一个句子而不是单独的行保存在文本文件中。

标签: pythontaggingnamed-entity-recognition

解决方案


尝试这个 :

In [9]: fp = open(name, 'w')                                                                                                                  

In [10]: for i in data: 
...:     for j in i: 
...:         fp.write('\t'.join(list(j))+'\n') 
...:                                                                                                                                      

In [11]: fp.close() 

推荐阅读