python - Wordcloud 只说明字母没有单词
问题描述
我目前正在分析文本数据,并从语料库中提取名词。
是的,我是一个新手,我来这里是为了通过我的错误来学习和改进。
当我根据提取的名词列创建词云时,词云只显示字母和符号,而不显示单个词。
我主要关心的不是 wordcloud,但由于我正在进一步分析文本、主题建模并旨在开发预测模型,我想确保该专栏没有进一步分析的问题。
from textblob import TextBlob
def get_nouns(text):
blob = TextBlob(text)
return [ word for (word,tag) in blob.tags if tag == "NN"]
df_unique['nouns'] = df_unique['tokenized'].apply(get_nouns)
#nouns wordcloud
all_words_xn = []
for line in df_unique['nouns']:
all_words_xn.extend(line)
# create a word frequency dictionary
wordfreq = Counter(all_words_xn)
# draw a Word Cloud with word frequencies
wordcloud = WordCloud(width=900,
height=500,
max_words=50,
max_font_size=100,
relative_scaling=0.5,
colormap='Blues',
normalize_plurals=True).generate_from_frequencies(wordfreq)
plt.figure(figsize=(17,14))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
带有来自数据框的名词的列
0 ['lot']
1 ['weapon', 'gun', 'instance']
2 ['drive', 'drive', 'car']
3 ['felt', 'guy', 'stage']
4 ['price', 'launch', 'ryse', 'son', 'ip', 'cryt...
5 ['drivatar', 'crash', 'guy', 'track', 'use', '...
6 ['spark', 'thing']
7 ['stream', 'player', 'linux', 'start', 'stream...
8 ['kill', 'game', 'absolute', 'shit']
9 ['breed', 'stealth', 'horse', 'duck']
10 ['beach', 'duty']
11 []
12 ['europe', 'guess']
13 ['power', 'cloud', 'god']
14 ['gameplay', 'footage', 'zoom']
15 []
16 ['stream', 'play', 'game', 'week', 'gdex', 'co...
17 ['edit']
19 ['halo', 'clip', 'lot', 'journey']
21 ['thing', 'master', 'chief', 'shawl', 'help', ...
22 ['respect', 'respawn', 'trailer', 'gameplay', ...
Name: nouns, Length: 7523, dtype: object
解决方案
你的代码很好。您未在此处显示的预处理管道中一定有错误。
请参阅下面的基于您的代码的完整工作示例:
from textblob import TextBlob
from collections import Counter
from wordcloud import WordCloud
texts = ["This is some text about thing", "This is another text about gun", "This is a text about car"]
df_unique = pd.DataFrame({"tokenized":texts})
def get_nouns(text):
blob = TextBlob(text)
return [ word for (word,tag) in blob.tags if tag == "NN"]
df_unique['nouns'] = df_unique['tokenized'].apply(get_nouns)
#nouns wordcloud
all_words_xn = []
for line in df_unique['nouns']:
all_words_xn.extend(line)
# create a word frequency dictionary
wordfreq = Counter(all_words_xn)
# draw a Word Cloud with word frequencies
wordcloud = WordCloud(width=900,
height=500,
max_words=50,
max_font_size=100,
relative_scaling=0.5,
colormap='Blues',
normalize_plurals=True).generate_from_frequencies(wordfreq)
plt.figure(figsize=(17,14))
plt.imshow(wordcloud, cmap="gray_r")
plt.axis("off")
plt.show()
推荐阅读
- javascript - 循环调用api时反应原生,“重新渲染太多”
- angular - 表单字段未使用默认 UI 显示
- vb.net - DLLImport 与 VB.NET 中的 DinamicInvoke 或 Method.Invoke
- django - 数据未使用 django import export 在 mysqldb 中导入
- r - R - 我们可以从奴隶创建一个新的 doMPI 集群吗
- facebook - 如何更改 facebook 分享标签的默认页面?
- laravel - 在 Laravel 中使用带有 GET 路由的参数时页面的延迟和不完整加载
- reactjs - 在另一个组件中访问组件的引用?
- python - 如何将一条线分成等长的线段?
- html - 没有间隙规则时显示 CSS 网格间隙