python - 使用 split 方法预处理文本文件中的数据
问题描述
我在下面写了一个文本样本。我想要的是将此文本附加到python中的列表数据结构中。我首先将此文本'<EOS>'
用作分隔符。然后将 split 方法的结果的每个元素附加到列表数据类型中。
但我面临的是该split
方法将文本'\n'
与其'<EOS>'
分隔符分隔开来。因此,现在将一行添加到列表数据类型,但不是完整部分。
请查看下面示例文本后面的代码,让我知道我做错了什么。
Old Major, the old boar on the Manor Farm, summons the animals on the farm together for a meeting, during which he refers to humans as "enemies" and teaches the animals a revolutionary song called "Beasts of England".
When Major dies, two young pigs, Snowball and Napoleon, assume command and consider it a duty to prepare for the Rebellion.<EOS>
Alex is a 15-year-old living in near-future dystopian England who leads his gang on a night of opportunistic, random "ultra-violence".
Alex's friends ("droogs" in the novel's Anglo-Russian slang, 'Nadsat') are Dim, a slow-witted bruiser who is the gang's muscle; Georgie, an ambitious second-in-command; and Pete, who mostly plays along as the droogs indulge their taste for ultra-violence.
Characterised as a sociopath and a hardened juvenile delinquent, Alex also displays intelligence, quick wit, and a predilection for classical music; he is particularly fond of Beethoven, referred to as "Lovely Ludwig Van".`
将文档读入列表类型的 Python 代码:
f=open('./plots')
documents=[]
for x in f:
documents.append(x.split('<EOS>'))
print documents[0]
#documents[0] must start from 'Old Major' and stops at 'Rebellion'.
解决方案
循环 f 会导致文件内容被换行符分割。改用这个:
f=open('./plots')
documents=f.read().split('<EOS>')
print documents[0]
推荐阅读
- python - 使用嵌套的 for 循环对列表进行排序和比较
- python - 尝试在列表中添加整数时出现连续类型错误?
- numpy - RuntimeError: 给定组=1,大小为 [32, 1, 3, 3] 的权重,预期输入 [1, 3, 6, 7] 有 1 个通道,但有 3 个通道
- node.js - 如何将一个模式作为类型包含在另一个模式中并将数据发布到猫鼬中
- c++ - 如何在类模板之外定义重载运算符?
- javascript - 为变量赋值会显示只读错误
- javascript - 输入类型="datetime-local" valueAsDate 返回 null
- r - 闪亮 - 替换 textInput 字段中的文本
- c# - 在 C# 中,通过 ref 传递值有什么好处?
- django - 将 get_context_data 方法添加到基于类的视图会破坏 django-tables2