首页 > 解决方案 > 使用 praw 抓取 subreddit 列表:“TypeError: 'Subreddit' object is not iterable”

问题描述

我正在使用 praw 和 Python 3 从 subreddits 列表中抓取帖子和评论。该代码以前适用于 1 个 subreddit 以及 [i] 个 subreddit 列表中的 [j] 个搜索词列表。我删除了搜索词列表,只是希望它遍历 subreddits 列表,但我不断收到“TypeError:'Subreddit' object is not iterable。我不明白发生了什么?

subs= ["sub1","sub2", "sub3", "sub4"]

commentsDict = {"comment_user": [], "comment_text":[], "comment_score":[], "comment_date":[] }
postsDict = {"post_title" : [], "post_score" : [], "post_comments_num":[], "post_date":[], \
                "post_user":[], "post_text":[], "post_id":[]}

for i in range(len(subs)):
    for submission in reddit.subreddit(subs[i]):
        submission.comment_sort = 'new'
        comments = list(submission.comments)
        for comments in submission.comments:
            postsDict["post_title"].append(submission.title)#title of post with comment
            postsDict["post_score"].append(submission.score)#upvotes-downvotes
            postsDict["post_text"].append(submission.selftext)#get body of post
            postsDict["post_id"].append(submission.id)#unique id address for post
            postsDict["post_user"].append(submission.author)  #user name of poster
            postsDict["post_comments_num"].append(submission.num_comments) #number of comments on post
            date = submission.created_utc                                  #create variable for date
            timestamp = datetime.datetime.fromtimestamp(date)              #create variable to translate unix date 
            postsDict["post_date"].append(timestamp.strftime('%Y-%m-%D %H:%M:%S')) #extract date and add to dict
            for top_level_comment in submission.comments:                   #create loop for extracting comments
                if isinstance(top_level_comment, MoreComments):
                    continue
            submission.comments.replace_more(limit=None)                   #tell Praw to click more comments and get those too
            commentsDict["comment_user"].append(comments.author)              #get comment username
            commentsDict["comment_score"].append(comments.score)            #comment upvotes-downvotes
            date = comments.created                                         #same date as above but for comments
            timestamp = datetime.datetime.fromtimestamp(date)
            commentsDict["comment_date"].append(timestamp.strftime('%Y-%m-%D %H:%M:%S')) #add translated unix date to dict
            commentsDict["comment_text"].append(comments.body)      #get comment text 

预先感谢您的帮助。

标签: python-3.xpraw

解决方案


首先(与您的问题无关),此循环遍历列表subs中的索引,然后使用该索引来获取项目:

for i in range(len(subs)):
    for submission in reddit.subreddit(subs[i]):

将其更改为直接迭代 subreddits:

for subreddit in subs:
    for submission in reddit.subreddit(subreddit):

现在来修正你的 PRAW 错误:你不能只迭代 subreddit ( for submission in reddit.subreddit(subreddit))。您必须指定要迭代的列表(例如 new、hot 或 top)。您可以在PRAW 文档中查看Subreddit. 这些列表对应于您在网络上查看 subreddit 时看到的各种选项卡:

Reddit 标签:热门、新、上升、有争议、顶级、镀金

例如,使用hot清单

for subreddit in subs:
    for submission in reddit.subreddit(subreddit).hot():

如果要指定返回的帖子数,可以使用limit参数:

for subreddit in subs:
    for submission in reddit.subreddit(subreddit).hot(limit=5):

上面的代码会给你每个 subreddit 最多 5 次提交。

你的其余代码做了一些非正统的事情。我在您之前的帖子中评论了其中一个,即:

comments = list(submission.comments)
for comments in submission.comments:

您设置comments为等于某事,然后从不使用它,因为它在下一行重新定义。我会删除这comments =条线,因为它什么也没做。

此外,对于帖子中的每条评论,您都会遍历帖子中的所有评论并且什么都不做:

for top_level_comment in submission.comments:                   #create loop for extracting comments
    if isinstance(top_level_comment, MoreComments):
        continue

我不知道你想让这段代码做什么,但现在它除了浪费时间之外什么都不做,所以我也会删除它。


推荐阅读