首页 > 解决方案 > 使用 Sentence Tokenizer 后从列表列表中选择子列表

问题描述

所以我在列表中有一些句子,例如:

some_list = ['Joe is travelling via train.' 
             'Joe waited for the train, but the train was late.'
             'Even after an hour, there was no sign of the 
              train. Joe then went to talk to station master about the 
              train's situation.']

然后我使用了 nltk 的句子标记器,因为我想单独分析一个完整句子中的每个句子。所以现在 O/P 在列表格式列表中看起来像这样:

sent_tokenize_list = [['Joe is travelling via train.'],
                      ['Joe waited for the train,',
                       'but the train was late.'],
                      ['Even after an hour,',
                       'there was no sign of the 
                        train.',
                       'Joe then went to talk to station master about 
                        the train's situation.']]    

现在,从这个列表列表中,我如何选择包含超过 1 个句子的列表,即我的示例中的第 2 和第 3 个列表,并将它们仅以列表格式作为单独的列表。

即 O/P 应该是

['Joe waited for the train,','but the train was late.'] 
['Even after an hour,','there was no sign of the train.',
 'Joe then went to talk to station master about the train's situation.']         

标签: pythonnltktokenizesentence

解决方案


您可以使用len来检查列表中的句子数量。

前任:

sent_tokenize_list = [['Joe is travelling via train.'],
                      ['Joe waited for the train,',
                       'but the train was late.'],
                      ['Even after an hour,','there was no sign of the train.',"Joe then went to talk to station master about the train's situation."]]


print([i for i in sent_tokenize_list if len(i) >= 2]) 

输出:

[['Joe waited for the train,', 'but the train was late.'], ['Even after an hour,', 'there was no sign of the train.', "Joe then went to talk to station master about the train's situation."]]

推荐阅读