python - Pandas 读取 TSV 错误的列
问题描述
我从 Pandas 中读取了这两个 TSV 文件:
train = pd.read_csv('https://raw.githubusercontent.com/loretoparisi/bert_text_classifier/master/data/imdb_kaggle/train.tsv', sep="\t", delimiter="\n", quoting=csv.QUOTE_ALL, engine="python", quotechar='"', encoding="utf-8")
test = pd.read_csv('https://raw.githubusercontent.com/loretoparisi/bert_text_classifier/master/data/imdb_kaggle/test.tsv', sep="\t", delimiter="\n", quoting=csv.QUOTE_ALL, engine="python", quotechar='"', encoding="utf-8")
但我得到了一个形状(N,1)
而不是(N,3)
(156060, 1) (66292, 1)
PhraseId\tSentenceId\tPhrase
0 156061\t8545\tAn intermittently pleasing but m...
1 156062\t8545\tAn intermittently pleasing but m...
2 156063\t8545\tAn
3 156064\t8545\tintermittently pleasing but most...
4 156065\t8545\tintermittently pleasing but most...
原始文件就像
PhraseId SentenceId Phrase
156061 8545 An intermittently pleasing but mostly routine effort .
156062 8545 An intermittently pleasing but mostly routine effort
156063 8545 An
156064 8545 intermittently pleasing but mostly routine effort
156065 8545 intermittently pleasing but mostly routine
156066 8545 intermittently pleasing but
156067 8545 intermittently pleasing
156068 8545 intermittently
156069 8545 pleasing
假设我已经通过了 separator sep='\t'
,为什么会read_csv
失败?
解决方案
推荐阅读
- amazon-web-services - AWS WAF 是否可以防止 GET 泛滥?
- python - python中列表子集中的排列
- python - 如何在烧瓶中使用熊猫剪切数据框?
- javascript - 比较每个 JS 的值
- typescript - Vue.JS SSR 如何与 vue 路由器一起使用?
- cobol - 不能在 NETEXPRESS 中使用 $
- python - 我们如何在 Qt Creator 中激活 virtualenv
- python - Google App Engine 连接到 Cloud SQL postgres
- python - 找出字符串是否包含特定顺序的字母组合
- c++ - 大小相同但成员数量不同的结构之间的性能差异