python - 使用网名和 Tweepy 收集推文
问题描述
我有一个 Twitter 网名列表(一百个),并且想为每个网名收集 3200 条推文。但是我只能使用下面的代码总共收集 3200 条推文,因为它达到了收集推文的限制如果我尝试输入 100 个屏幕名称.. .. 有人可以建议每个屏幕名称收集 3200 条推文吗?如果您能分享一些建议,将不胜感激!先感谢您!
import tweepy
import csv
def get_all_tweets(screen_name):
consumer_key = ****
consumer_secret = ****
access_key = ****
access_secret = ****
#authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
#initialize a list to hold all the tweepy Tweets & list with no retweets
alltweets = []
noRT = []
#make initial request for most recent tweets with extended mode enabled to get full tweets
new_tweets = api.user_timeline(screen_name = screen_name, tweet_mode = 'extended', count=200, include_retweets=False)
#save most recent tweets
alltweets.extend(new_tweets)
#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
#keep grabbing tweets until the api limit is reached
while len(alltweets) <= 3200:
print("getting tweets before {}".format(oldest))
#all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name = screen_name,tweet_mode = 'extended', count=200,max_id=oldest, include_retweets=False)
#save most recent tweets
alltweets.extend(new_tweets)
#update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
print("...{} tweets downloaded so far".format(len(alltweets)))
#removes retweets
for tweet in alltweets:
if 'RT' in tweet.full_text:
continue
else:
noRT.append([tweet.id_str, tweet.created_at, tweet.full_text, ])
#write to csv
with open('{}_tweets.csv'.format(screen_name), 'w') as f:
writer = csv.writer(f)
writer.writerow(["id","created_at","text"])
writer.writerows(noRT)
print('{}_tweets.csv was successfully created.'.format(screen_name))
pass
if __name__ == '__main__':
#pass in the username of the account you want to download. I have hundred username in the list
usernames = ["JLo", "ABC", 'Trump']
for x in usernames:
get_all_tweets(x)
解决方案
首先,为了遍历时间线,您必须使用分页。我建议你在 tweepy 中使用Cursor,因为它比处理 max_id 等要容易得多。
for page in tweepy.Cursor(api.user_timeline,
screen_name = screen_name,
tweet_mode="extended",
include_retweets=False,
count=100).pages(num_pages = 32):
for status in page:
# do your process on status
其次,确实有一个速率限制,您可以在此处找到,因此收到您达到限制的警告并不少见: https ://developer.twitter.com/en/docs/twitter-api/v1/tweets/时间表/常见问题解答
推荐阅读
- python - 如何停止打印机功能?
- android - 无法从短信广播中获取 OTP
- pdf - 在网格单元格内生成带有阿拉伯文文本的 PDF
- node.js - Webauthn - Windows Hello 身份验证器选择不起作用
- android - 华为 Unity IAP Kit 集成获取错误代码 -1,错误代码 6004
- linux - 写入一个sysfs节点,导致系统总是写入该节点
- firebase - firebase 的短信代码不匹配
- nattable - 如何防止 nattable 中的多项选择?
- angular - 如何有条件地隐藏 ag 网格中的列
- spring-boot - 分配配置文件时SpringBoot无法启动