首页 > 解决方案 > Tokenize my CSV in one list rather than separate using Python

问题描述

I want to tokenize my CSV in one list rather than a separate list?

with open ('train.csv') as file_object:
    for trainline in file_object:
        tokens_train = sent_tokenize(trainline)
        print(tokens_train)

This is how I am getting the output:

['2.1 Separated of trains']
['Principle: The method to make the signal is different.']
['2.2 Context']

I want all of them in one list

['2.1 Separated of trains','Principle: The method to make the signal is different.','2.2 Context']

标签: pythontokenize

解决方案


由于sent_tokenize()返回一个列表,您可以简单地每次扩展一个起始列表。

alltokens = []

with open ('train.csv') as file_object:
    for trainline in file_object:
        tokens_train = sent_tokenize(trainline)
        alltokens.extend(tokens_train)
    print(alltokens)

或者使用列表理解:

with open ('train.csv') as file_object:
    alltokens = [token for trainline in file_object for token in sent_tokenize(trainline)]
print(alltokens)

即使sent_tokenize()返回的列表长于 1,这两种解决方案都将起作用。


推荐阅读