python - Tokenize my CSV in one list rather than separate using Python
问题描述
I want to tokenize my CSV in one list rather than a separate list?
with open ('train.csv') as file_object:
for trainline in file_object:
tokens_train = sent_tokenize(trainline)
print(tokens_train)
This is how I am getting the output:
['2.1 Separated of trains']
['Principle: The method to make the signal is different.']
['2.2 Context']
I want all of them in one list
['2.1 Separated of trains','Principle: The method to make the signal is different.','2.2 Context']
解决方案
由于sent_tokenize()
返回一个列表,您可以简单地每次扩展一个起始列表。
alltokens = []
with open ('train.csv') as file_object:
for trainline in file_object:
tokens_train = sent_tokenize(trainline)
alltokens.extend(tokens_train)
print(alltokens)
或者使用列表理解:
with open ('train.csv') as file_object:
alltokens = [token for trainline in file_object for token in sent_tokenize(trainline)]
print(alltokens)
即使sent_tokenize()
返回的列表长于 1,这两种解决方案都将起作用。
推荐阅读
- python - 根据 Python 中的值对字典进行排序
- java - 如何让图片真正尊重它们在 Android 中的界限?
- python - 我如何在python中解析一个xml文件
- flutter - Flutter:添加“image_cropper”和“contacts_service”包时出现构建错误
- php - xampp 错误服务器证书不包括
- java - 用户输入抛出 NegativeArraySizeException;相同的硬编码(BOTH POSITIVE)数字有效
- java - 使用 java mail api 触发邮件时在邮件中添加了不必要的附件
- r - 当另一列值更改时替换列值
- python - 方法 .as_matrix 将在未来版本中删除。改用 .values
- oauth - 有没有办法在将用户输入的电子邮件地址发送到 Google OAuth 2.0 API 之前获取它?