首页 > 解决方案 > Wordcloud 只说明字母没有单词

问题描述

我目前正在分析文本数据,并从语料库中提取名词。

是的,我是一个新手,我来这里是为了通过我的错误来学习和改进。

当我根据提取的名词列创建词云时,词云只显示字母和符号,而不显示单个词。

我主要关心的不是 wordcloud,但由于我正在进一步分析文本、主题建模并旨在开发预测模型,我想确保该专栏没有进一步分析的问题。

from textblob import TextBlob
def get_nouns(text):
   blob = TextBlob(text)
   return [ word for (word,tag) in blob.tags if tag == "NN"]

df_unique['nouns'] = df_unique['tokenized'].apply(get_nouns)

#nouns wordcloud
all_words_xn = []
for line in df_unique['nouns']: 
    all_words_xn.extend(line)

# create a word frequency dictionary
wordfreq = Counter(all_words_xn)
# draw a Word Cloud with word frequencies
wordcloud = WordCloud(width=900,
                  height=500,
                  max_words=50,
                  max_font_size=100,
                  relative_scaling=0.5,
                  colormap='Blues',
                  normalize_plurals=True).generate_from_frequencies(wordfreq)
plt.figure(figsize=(17,14))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

当前 Wordcloud 输出

带有来自数据框的名词的列

0                                                 ['lot']
1                           ['weapon', 'gun', 'instance']
2                               ['drive', 'drive', 'car']
3                                ['felt', 'guy', 'stage']
4       ['price', 'launch', 'ryse', 'son', 'ip', 'cryt...
5       ['drivatar', 'crash', 'guy', 'track', 'use', '...
6                                      ['spark', 'thing']
7       ['stream', 'player', 'linux', 'start', 'stream...
8                    ['kill', 'game', 'absolute', 'shit']
9                   ['breed', 'stealth', 'horse', 'duck']
10                                      ['beach', 'duty']
11                                                     []
12                                    ['europe', 'guess']
13                              ['power', 'cloud', 'god']
14                        ['gameplay', 'footage', 'zoom']
15                                                     []
16      ['stream', 'play', 'game', 'week', 'gdex', 'co...
17                                               ['edit']
19                     ['halo', 'clip', 'lot', 'journey']
21      ['thing', 'master', 'chief', 'shawl', 'help', ...
22      ['respect', 'respawn', 'trailer', 'gameplay', ...

Name: nouns, Length: 7523, dtype: object

标签: pythontextnlppreprocessorword-cloud

解决方案


你的代码很好。您未在此处显示的预处理管道中一定有错误。

请参阅下面的基于您的代码的完整工作示例:

from textblob import TextBlob
from collections import Counter
from wordcloud import WordCloud

texts = ["This is some text about thing", "This is another text about gun", "This is a text about car"]
df_unique = pd.DataFrame({"tokenized":texts})

def get_nouns(text):
    blob = TextBlob(text)
    return [ word for (word,tag) in blob.tags if tag == "NN"]

df_unique['nouns'] = df_unique['tokenized'].apply(get_nouns)

#nouns wordcloud
all_words_xn = []
for line in df_unique['nouns']: 
    all_words_xn.extend(line)


# create a word frequency dictionary
wordfreq = Counter(all_words_xn)
# draw a Word Cloud with word frequencies
wordcloud = WordCloud(width=900,
                  height=500,
                  max_words=50,
                  max_font_size=100,
                  relative_scaling=0.5,
                  colormap='Blues',
                  normalize_plurals=True).generate_from_frequencies(wordfreq)
plt.figure(figsize=(17,14))
plt.imshow(wordcloud, cmap="gray_r")
plt.axis("off")
plt.show()

在此处输入图像描述


推荐阅读