首页 > 解决方案 > PRAW Bot:尝试返回带有百分比的字数数值。在 Python 中

问题描述

我在 Python 中使用 PRAW 库来扫描 subreddit 以查找评论部分中最常用的单词。我将它输出到显示百分比值的饼图,但还想获得更多关于每个单词使用量的数据。我不知道要添加什么来获取这些数据,我相信这并不难。

到目前为止的源代码:

import praw
import matplotlib.pyplot as plt

reddit = praw.Reddit(
        client_id='',
        client_secret='',
        user_agent='',
        password=''                        
)

sub = ''

subreddit = reddit.subreddit(sub)

hot_subreddit = subreddit.hot()
count = 0
max = 10000
print('Success.')
words = []
wordCount = {}
commonWords = {'the','of','to','and','a','in','is','it','you','that','he','was','for','on','are','with','as','be',
                'word','we',"there's",'use',"how's",'each','which',"they're",'time','If','way','many','then','write',
                'these','long','make','thing','see','him','two','has','look','more','day','could','go','come','did',
                'number','sound','no','most','people','my','over','know','water','than','call','first','who','may','down','side','been','now',
                'find','this','I','it','has','but','have','they','','be','an','or','at','do','if','your','not','can','my','their','them','at','about','would','like','there','You',
                'from','get','just','more','so','me','more','out','up','some','will','how','one','what',"don't",'should','could','did','no',
                'know','were','did',"it's",'This','The','all','when','had','see','his','him','who','by','her','she','our','thing','-',
                'now','going','been',"I'm",'than','any','because','We','even','said','only','want','other','into','He','what','i',
                'That','thought','think',"that's",'Is','much',"I'm",'I`m','go','still','just','me','This','into'}

for submission in subreddit.hot(limit=1000):
    submission.comments.replace_more(limit=0)
    for hot_level_comment in submission.comments:
        count += 1
        if(count == max):
            break
        word = ''
        for letter in hot_level_comment.body:
            if(letter == ' '):
                if(word and not word[-1].isalnum()):
                    word = word[:-1]
                if not word in commonWords:
                    words.append(word)
                word = ''
            else:
                word += letter
    if(count == max):
        break

for word in words:
    if word in wordCount:
        wordCount[word] += 1
    else:
        wordCount[word] = 1

sortedList = sorted(wordCount, key = wordCount.get, reverse = True)

keyWords = []
keyCount = []
amount = 0

for entry in sortedList:
    keyWords.append(entry)
    keyCount.append(wordCount[entry])
    amount += 1
    if (amount == 20):
        break

labels = keyWords
sizes = keyCount

for word in words:
    print("Word: " + word)
    print("Count: " + count)
    print("******************")


plt.title('20 Most Popular Words On: r/' + sub)
plt.pie(sizes, labels=labels, autopct='%1.1f%%', shadow=False, startangle=90)
plt.axis('equal')

plt.show()

这里有什么想法吗?

标签: pythonbotscounterredditpraw

解决方案


您可以将numpy.uniquereturn_counts=True.

a = np.array(['a', 'b', 'b', 'c', 'a'])
values, counts= np.unique(a, return_counts=True)
# values, counts
# ['a', 'b', 'c'], [2, 2, 1]

这样你就可以看到每个单词被使用了多少次。如果你愿意,可以对这两个数组进行排序以获得例如前 20 个单词和每个单词出现的时间。


推荐阅读