首页 > 解决方案 > 从 Pandas 系列创建 Wordcloud

问题描述

import pandas as pd
import matplotlib.pyplot as plt
% matplotlib inline
import spacy
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
nlp= spacy.load('en_core_web_sm')

在这里,我有一个名为的系列tokens_lemma,它删除了停用词,已经.lower()和 lemmalize

tokens_lemma
0        [laptop, sit, 4, star, similarly, price, compa...
1        [order, monitor, want, makeshift, area, powerf...
2        [monitor, great, deal, price, size, ., use, of...
3        [buy, height, adjustment, ., swivel, ability, ...
4        [work, month, die, ., 5, call, hp, support, nu...
                               ...                        
30618                                        [great, deal]
30619                                  [pour, le, travail]
30620                                      [business, use]
30621                                         [good, size]
30622    [pour, mon, ordinateur.plus, grande, image.vra...
Name: text_body, Length: 30623, dtype: object

我想使用上述系列创建一个 wordcloud。

for w in tokens_lemma:
w=str(w)
comment_words += " ".join(w)

wordcloud = WordCloud(width = 800, height = 800, 
background_color ='white',  
min_font_size = 10).generate(comment_words)

# plot the WordCloud image                        
plt.figure(figsize = (8, 8), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
  
plt.show() 

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-9faee80d909b> in <module>
----> 1 wordcloud = WordCloud(width = 800, height = 800, 
      2 background_color ='white',
      3 min_font_size = 10).generate(comment_words)

~\anaconda3\envs\datasci\lib\site-packages\wordcloud\wordcloud.py in generate(self, text)
    629         self
    630         """
--> 631         return self.generate_from_text(text)
    632 
    633     def _check_generated(self):

    ~\anaconda3\envs\datasci\lib\site-packages\wordcloud\wordcloud.py in generate_from_text(self, text)
        611         """
        612         words = self.process_text(text)
    --> 613         self.generate_from_frequencies(words)
        614         return self
        615 
    
~\anaconda3\envs\datasci\lib\site-packages\wordcloud\wordcloud.py in generate_from_frequencies(self, frequencies, max_font_size)
    401         frequencies = sorted(frequencies.items(), key=itemgetter(1), reverse=True)
    402         if len(frequencies) <= 0:
--> 403             raise ValueError("We need at least 1 word to plot a word cloud, "
    404                              "got %d." % len(frequencies))
    405         frequencies = frequencies[:self.max_words]

ValueError: We need at least 1 word to plot a word cloud, got 0.

我个人认为问题出在comment_words对象上,因为 wordcloud 的数据格式错误。但同样,我不知道如何更改以匹配 wordcloud 格式。

标签: pythonnlpdata-miningword-cloud

解决方案


我在写完问题后不小心解决了这个问题,所以我还是要发布它。只需简单地将数据从 更改comment_wordsstr(tokens_lemma)似乎工作正常。

wordcloud = WordCloud(width = 800, height = 800, 
background_color ='white',  
min_font_size = 10).generate(str(tokens_lemma))

推荐阅读