python - 将 Twitter 引擎搜索转换为 pandas 数据框
问题描述
我使用下面的代码创建了一个循环来捕获独特的推文。
engine = Twitter(language='en')
idindex = set()
tweets = []
prev = None
#create loop to capture 200 unique tweets for 'Winter snow storm'
for i in range(5):
print(i)
for tweet in engine.search('Winter snow storm', start=prev, count= 200, cached=False):
print(f'ID = {tweet.id}, hashtag = {hashtags(tweet.text)}, text = {tweet.text}, author = {tweet.author}, date = {tweet.date}')
if len(tweet.text) > 0 and tweet.id not in idindex:
tweets.append(tweet.text)
idindex.add(tweet.id)
prev = tweet.id
print(f'Found {len(tweets)} tweets.')
print('')
前几行的输出
0
ID = 1411779493195182080, hashtag = ['#Siveen', '#AgniSiragugal'], text = RT @NaveenFilmmaker: I’m a proud father today. My daughter #Siveen completes dubbing for #AgniSiragugal. For a 5yr old kid she boldly performed in that harsh European winter and snow storm. Today she made me cry when she dubbed an emotional scene. சிவீன் தந்தை நவீன் என்று சொல்லிக்கொள்வதில் பெருமை!!, author = YathumOorey, date = Sun Jul 04 20:10:35 +0000 2021
ID = 1411770366570106880, hashtag = ['#Siveen', '#AgniSiragugal'], text = RT @NaveenFilmmaker: I’m a proud father today. My daughter #Siveen completes dubbing for #AgniSiragugal. For a 5yr old kid she boldly performed in that harsh European winter and snow storm. Today she made me cry when she dubbed an emotional scene. சிவீன் தந்தை நவீன் என்று சொல்லிக்கொள்வதில் பெருமை!!, author = itzabiiz, date = Sun Jul 04 19:34:19 +0000 2021
ID = 1411769785554063360, hashtag = ['#Siveen', '#AgniSiragugal'], text = RT @NaveenFilmmaker: I’m a proud father today. My daughter #Siveen completes dubbing for #AgniSiragugal. For a 5yr old kid she boldly performed in that harsh European winter and snow storm. Today she made me cry when she dubbed an emotional scene. சிவீன் தந்தை நவீன் என்று சொல்லிக்கொள்வதில் பெருமை!!, author = kalaimaniraj1, date = Sun Jul 04 19:32:00 +0000 2021
...
有没有办法将上面的输出存储到熊猫数据框中?我希望数据框包含以下内容:ID = {tweet.id}, hashtag = {hashtags(tweet.text)}, text = {tweet.text}, author = {tweet.author}, date = {tweet.date }。
解决方案
这应该完成这项工作:
import pandas as pd
engine = Twitter(language='en')
idindex = set()
tweets = []
df= pd.DataFrame()
prev = None
#create loop to capture 200 unique tweets for 'Winter snow storm'
for i in range(5):
print(i)
for tweet in engine.search('Winter snow storm', start=prev, count= 200, cached=False):
print(f'ID = {tweet.id}, hashtag = {hashtags(tweet.text)}, text = {tweet.text}, author = {tweet.author}, date = {tweet.date}')
df.append({'ID':tweet.id, 'hashtag':hashtags(tweet.text), 'text':tweet.text,'author' : tweet.author, 'date' : tweet.date})
if len(tweet.text) > 0 and tweet.id not in idindex:
tweets.append(tweet.text)
idindex.add(tweet.id)
prev = tweet.id
print(f'Found {len(tweets)} tweets.')
print('')
推荐阅读
- onedrive - 如何从 OneDrive 取消链接工作帐户
- angular - 如何使用 ContentChildren 和 QueryList 重新排序孩子
- c# - 双击datagridview行后如何在dateTimePicker中获取正确的日期时间格式而不解析
- sql - 优化了 SQL SERVER 中逐字反转输入字符串的函数的替代代码
- java - 从 Json 手动映射到 crnk 中的 DTO 子类之一
- linux - mdadm 上的失败事件
- objective-c - 创建私有 2D 浮点数组 - Objective C
- typescript - 表单验证 Angular 4 | 多字段验证 | 根据其他字段在空字段上设置错误
- c# - 显示时间线右侧的数据
- c# - C# 任务每 x 计数重复一次