首页 > 解决方案 > 在 Python 中标记一个句子并重新连接结果

问题描述

我遇到了一个问题,我正在寻求帮助,我确实有以下代码:

import nltk
import pandas as pd
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()

d = {'col1': ['AI is our friend and it has been friendly', 'AI and human have always been friendly']}
df = pd.DataFrame(data=d)

sample_lst = []
for q in df['col1']:

   nltk_tokens = nltk.word_tokenize(q)
   for w in nltk_tokens:
          sample_lst.append(wordnet_lemmatizer.lemmatize(w, pos='v'))
          print(sample_lst)

该代码有效并将 wordnet_lemmatizer.lemmatize 附加到列表中,但是,我想将结果保存在原始输入旁边的 CSV 文件中,如下所示

Col1                                        Col2
AI is our friend and it has been friendly   IA be our friend and it have be friendly
AI and humans have always been friendly     AI and humans have always be friendly

我试图做一个 ''.join() 但结果不是我所期望的,任何关于如何重新加入句子并将其添加到新列中的想法提前谢谢。

标签: pythonpandasnltk

解决方案


利用:

#create list for all values
out = []
for q in df['col1']:
   #create list for each value
   sample_lst = []
   nltk_tokens = nltk.word_tokenize(q)
   for w in nltk_tokens:
          sample_lst.append(wordnet_lemmatizer.lemmatize(w, pos='v'))
   #join lists by space
   out.append(' '.join(sample_lst))

df['Col2'] = out
print (df)
                                        col1  \
0  AI is our friend and it has been friendly   
1     AI and human have always been friendly   

                                       Col2  
0  AI be our friend and it have be friendly  
1      AI and human have always be friendly  

嵌套列表理解的另一个解决方案:

df['Col2'] = [' '.join(wordnet_lemmatizer.lemmatize(w, pos='v') 
              for w in nltk.word_tokenize(q)) 
              for q in df['col1']]

推荐阅读