python-3.x - 使用 praw 抓取 subreddit 列表:“TypeError: 'Subreddit' object is not iterable”
问题描述
我正在使用 praw 和 Python 3 从 subreddits 列表中抓取帖子和评论。该代码以前适用于 1 个 subreddit 以及 [i] 个 subreddit 列表中的 [j] 个搜索词列表。我删除了搜索词列表,只是希望它遍历 subreddits 列表,但我不断收到“TypeError:'Subreddit' object is not iterable。我不明白发生了什么?
subs= ["sub1","sub2", "sub3", "sub4"]
commentsDict = {"comment_user": [], "comment_text":[], "comment_score":[], "comment_date":[] }
postsDict = {"post_title" : [], "post_score" : [], "post_comments_num":[], "post_date":[], \
"post_user":[], "post_text":[], "post_id":[]}
for i in range(len(subs)):
for submission in reddit.subreddit(subs[i]):
submission.comment_sort = 'new'
comments = list(submission.comments)
for comments in submission.comments:
postsDict["post_title"].append(submission.title)#title of post with comment
postsDict["post_score"].append(submission.score)#upvotes-downvotes
postsDict["post_text"].append(submission.selftext)#get body of post
postsDict["post_id"].append(submission.id)#unique id address for post
postsDict["post_user"].append(submission.author) #user name of poster
postsDict["post_comments_num"].append(submission.num_comments) #number of comments on post
date = submission.created_utc #create variable for date
timestamp = datetime.datetime.fromtimestamp(date) #create variable to translate unix date
postsDict["post_date"].append(timestamp.strftime('%Y-%m-%D %H:%M:%S')) #extract date and add to dict
for top_level_comment in submission.comments: #create loop for extracting comments
if isinstance(top_level_comment, MoreComments):
continue
submission.comments.replace_more(limit=None) #tell Praw to click more comments and get those too
commentsDict["comment_user"].append(comments.author) #get comment username
commentsDict["comment_score"].append(comments.score) #comment upvotes-downvotes
date = comments.created #same date as above but for comments
timestamp = datetime.datetime.fromtimestamp(date)
commentsDict["comment_date"].append(timestamp.strftime('%Y-%m-%D %H:%M:%S')) #add translated unix date to dict
commentsDict["comment_text"].append(comments.body) #get comment text
预先感谢您的帮助。
解决方案
首先(与您的问题无关),此循环遍历列表subs
中的索引,然后使用该索引来获取项目:
for i in range(len(subs)):
for submission in reddit.subreddit(subs[i]):
将其更改为直接迭代 subreddits:
for subreddit in subs:
for submission in reddit.subreddit(subreddit):
现在来修正你的 PRAW 错误:你不能只迭代 subreddit ( for submission in reddit.subreddit(subreddit)
)。您必须指定要迭代的列表(例如 new、hot 或 top)。您可以在PRAW 文档中查看Subreddit
. 这些列表对应于您在网络上查看 subreddit 时看到的各种选项卡:
例如,使用hot
清单:
for subreddit in subs:
for submission in reddit.subreddit(subreddit).hot():
如果要指定返回的帖子数,可以使用limit
参数:
for subreddit in subs:
for submission in reddit.subreddit(subreddit).hot(limit=5):
上面的代码会给你每个 subreddit 最多 5 次提交。
你的其余代码做了一些非正统的事情。我在您之前的帖子中评论了其中一个,即:
comments = list(submission.comments)
for comments in submission.comments:
您设置comments
为等于某事,然后从不使用它,因为它在下一行重新定义。我会删除这comments =
条线,因为它什么也没做。
此外,对于帖子中的每条评论,您都会遍历帖子中的所有评论并且什么都不做:
for top_level_comment in submission.comments: #create loop for extracting comments
if isinstance(top_level_comment, MoreComments):
continue
我不知道你想让这段代码做什么,但现在它除了浪费时间之外什么都不做,所以我也会删除它。
推荐阅读
- android - Android Studio 4.1调试断点不起作用
- wordpress - 在 Woocommerce 中隐藏特定类别档案中的缺货产品
- azure-storage - Azure 存储帐户(经典)- 仅使用帐户名称访问?
- python - 如何用python提取excel中的上标或下标数据?
- python - 在 matplotlib 中用相同的空间绘制不同大小的散点图
- javascript - php 中新行的 jQuery 语法是什么?
- flutter - 连接到 PHP Json Flutter
- python - 怎么不是和!Python中的不同?
- mysql - 为什么当值大于一定位数时结果不同?
- java - 消息气泡自定义可绘制对象