python - PRAW Bot:尝试返回带有百分比的字数数值。在 Python 中
问题描述
我在 Python 中使用 PRAW 库来扫描 subreddit 以查找评论部分中最常用的单词。我将它输出到显示百分比值的饼图,但还想获得更多关于每个单词使用量的数据。我不知道要添加什么来获取这些数据,我相信这并不难。
到目前为止的源代码:
import praw
import matplotlib.pyplot as plt
reddit = praw.Reddit(
client_id='',
client_secret='',
user_agent='',
password=''
)
sub = ''
subreddit = reddit.subreddit(sub)
hot_subreddit = subreddit.hot()
count = 0
max = 10000
print('Success.')
words = []
wordCount = {}
commonWords = {'the','of','to','and','a','in','is','it','you','that','he','was','for','on','are','with','as','be',
'word','we',"there's",'use',"how's",'each','which',"they're",'time','If','way','many','then','write',
'these','long','make','thing','see','him','two','has','look','more','day','could','go','come','did',
'number','sound','no','most','people','my','over','know','water','than','call','first','who','may','down','side','been','now',
'find','this','I','it','has','but','have','they','','be','an','or','at','do','if','your','not','can','my','their','them','at','about','would','like','there','You',
'from','get','just','more','so','me','more','out','up','some','will','how','one','what',"don't",'should','could','did','no',
'know','were','did',"it's",'This','The','all','when','had','see','his','him','who','by','her','she','our','thing','-',
'now','going','been',"I'm",'than','any','because','We','even','said','only','want','other','into','He','what','i',
'That','thought','think',"that's",'Is','much',"I'm",'I`m','go','still','just','me','This','into'}
for submission in subreddit.hot(limit=1000):
submission.comments.replace_more(limit=0)
for hot_level_comment in submission.comments:
count += 1
if(count == max):
break
word = ''
for letter in hot_level_comment.body:
if(letter == ' '):
if(word and not word[-1].isalnum()):
word = word[:-1]
if not word in commonWords:
words.append(word)
word = ''
else:
word += letter
if(count == max):
break
for word in words:
if word in wordCount:
wordCount[word] += 1
else:
wordCount[word] = 1
sortedList = sorted(wordCount, key = wordCount.get, reverse = True)
keyWords = []
keyCount = []
amount = 0
for entry in sortedList:
keyWords.append(entry)
keyCount.append(wordCount[entry])
amount += 1
if (amount == 20):
break
labels = keyWords
sizes = keyCount
for word in words:
print("Word: " + word)
print("Count: " + count)
print("******************")
plt.title('20 Most Popular Words On: r/' + sub)
plt.pie(sizes, labels=labels, autopct='%1.1f%%', shadow=False, startangle=90)
plt.axis('equal')
plt.show()
这里有什么想法吗?
解决方案
您可以将numpy.unique与return_counts=True
.
a = np.array(['a', 'b', 'b', 'c', 'a'])
values, counts= np.unique(a, return_counts=True)
# values, counts
# ['a', 'b', 'c'], [2, 2, 1]
这样你就可以看到每个单词被使用了多少次。如果你愿意,可以对这两个数组进行排序以获得例如前 20 个单词和每个单词出现的时间。
推荐阅读
- google-cloud-platform - 别名 IP 在 Windows Server 2016、Google 云中的数据中心实例中不起作用
- java - 如何使用 Orika 将类映射到接口?
- ios - Cocoapods:包含资源文件夹 Assets.xcassets 的 pod 出现问题
- mongodb - 在使用 CodeIgniter 3X 安装 Mongodb 期间出现错误
- c# - 如何从 .NET Core 3.1 中的 Autofac 容器中解析已注册的依赖项
- android - React-native-tab-view:点击选项卡时过渡非常缓慢,但滑动时非常流畅
- reactjs - NextJS 钩子在 Nginx 代理传递的 Docker 容器中不起作用
- javascript - 如何使用 Google Drive API 获取特定文件夹中所有文件的列表
- xunit - xUnit IClassFixture 并非在每次测试时都运行
- html - 如何仅在必要时使用CSS拉伸内容以填充容器,同时均匀包装文本?