首页 > 解决方案 > 解析以引号开头的tsv文件

问题描述

我试图解析一个包含两列的 TSV 文件,有些行只包含引号。有没有办法使用python将它们解析为单独的行而不在引号前添加'\'?

presents    O
it    O
in    O
"   O
classical   O
"   O
principles  O
on  O
which   O
'   O
the O
operation   O
was O
'   O
conceived   O
.   O

我试过像这样的代码

with open("sample.tsv") as tsvfile:
    tsvreader = csv.reader(tsvfile, delimiter="\t")
    for line in tsvreader:
        print (line)

三行的结果不正确

"   O
classical   O
"   O

目前的结果是

['\tO\nclassical\tO\n', 'O']

我希望结果是

['"', 'O']
['classical', 'O']
['"', 'O']

标签: pythonpandascsv

解决方案


您可以通过在创建实例时csv.reader添加来告诉忽略引号字符:quoting = csv.QUOTE_NONE

import csv
with open("sample.tsv") as tsvfile:
    tsvreader = csv.reader(tsvfile, delimiter="\t", quoting=csv.QUOTE_NONE)
    for line in tsvreader:
        print (line)

输出(对于带引号的行):

['"', 'O']
['classical', 'O']
['"', 'O']

csv 模块的文档解释说csv.QUOTE_NONE“指示reader不对引号字符执行特殊处理”。


推荐阅读