首页 > 解决方案 > Pandas 读取 TSV 错误的列

问题描述

我从 Pandas 中读取了这两个 TSV 文件:

train = pd.read_csv('https://raw.githubusercontent.com/loretoparisi/bert_text_classifier/master/data/imdb_kaggle/train.tsv', sep="\t", delimiter="\n", quoting=csv.QUOTE_ALL, engine="python", quotechar='"', encoding="utf-8")
test = pd.read_csv('https://raw.githubusercontent.com/loretoparisi/bert_text_classifier/master/data/imdb_kaggle/test.tsv', sep="\t", delimiter="\n", quoting=csv.QUOTE_ALL, engine="python", quotechar='"', encoding="utf-8")

但我得到了一个形状(N,1)而不是(N,3)

(156060, 1) (66292, 1)
PhraseId\tSentenceId\tPhrase
0   156061\t8545\tAn intermittently pleasing but m...
1   156062\t8545\tAn intermittently pleasing but m...
2   156063\t8545\tAn
3   156064\t8545\tintermittently pleasing but most...
4   156065\t8545\tintermittently pleasing but most...

原始文件就像

PhraseId    SentenceId  Phrase
156061  8545    An intermittently pleasing but mostly routine effort .
156062  8545    An intermittently pleasing but mostly routine effort
156063  8545    An
156064  8545    intermittently pleasing but mostly routine effort
156065  8545    intermittently pleasing but mostly routine
156066  8545    intermittently pleasing but
156067  8545    intermittently pleasing
156068  8545    intermittently
156069  8545    pleasing

假设我已经通过了 separator sep='\t',为什么会read_csv失败?

标签: pythonpandascsv

解决方案


推荐阅读