python - 加入列中的单词列表
问题描述
是否可以在熊猫中加入单词?我有一个单词列表,我正在尝试再次将它们变成短语
数据
0 [hello, she, can, seem, to, form, something, like, a, coherent,...
1 [not, any, more,...
2 [it, is, unclear, if, any, better, deal,...
3 [but, few, in, her, party, seem, inclined ...
4 [it, is, unclear, if, the, basic, conditions, for, any,...
Name: Data, dtype: object
stop_words = set(stopwords.words('english'))
#new words
new_stopwords = {'hello'}
new_list = stop_words.union(new_stopwords)
#remove from NLTK stop list
not_stopwords = {'no', 'not, 'any'}
stopwords_list = set([word for word in new_list if word not in not_stopwords])
df['Data'] = df['Data'].' '.join([wrd for wrd in Data if wrd not in stopwords_list])
输出:
File "<ipython-input-281-498b9daa386f>", line 1
df['Description_pretraites'] = df['Description_pretraites'].' '.join([wrd for wrd in replace_hour_token if wrd not in stopwords_list])
^
SyntaxError: invalid syntax
良好的输出
0 [can seem form something like coherent...
1 [not any more...
2 [is unclear any better deal...
3 [few party seem inclined ...
4 [is unclear basic conditions any...
Name: Data, dtype: object
据我所见,在熊猫中,连接用于连接列。但是可以在一列中加入吗?
解决方案
.apply
与发电机一起使用:
df['Data']=df['Data'].apply(lambda x: ' '.join(wrd for wrd in x if wrd not in stopwords_list))
或嵌套列表理解:
df['Data'] = [' '.join(wrd for wrd in x if wrd not in stopwords_list) for x in df['Data']]
样品:
d = {'Data':[['hello', 'she', 'can'],
['not', 'no', 'more', 'to']]}
df = pd.DataFrame(data=d)
print (df)
Data
0 [hello, she, can]
1 [not, no, more, to]
stopwords_list = set(['no','not'])
df['Data'] = [' '.join(wrd for wrd in x if wrd not in stopwords_list) for x in df['Data']]
print (df)
Data
0 hello she can
1 more to
推荐阅读
- java - java.lang.NullPointerException(android 中的错误)
- r - 为 GWR 映射创建函数
- paypal - 贝宝智能按钮
- tabulator - 在运行时更改制表符的占位符
- pytorch - 了解 PyTorch Linear 的工作原理
- php - 更改电子邮件上的 WooCommerce 电话号码链接
- java - 使用 Jackson 将 YAML 反序列化为 Java 对象时在新行中保留缩进
- python - 你可以在 Numpy 中制作多小的 DataType?
- jquery - jquery自动完成文本输入美国州每个州去不同的页面不区分大小写
- c++ - 如何在 C/C++ 10.5 章节数值配方中调用** var