首页 > 解决方案 > 将 Twitter 引擎搜索转换为 pandas 数据框

问题描述

我使用下面的代码创建了一个循环来捕获独特的推文。

engine = Twitter(language='en')
idindex = set()
tweets = []

prev = None

#create loop to capture 200 unique tweets for 'Winter snow storm'
for i in range(5):
    print(i)
    for tweet in engine.search('Winter snow storm', start=prev, count= 200, cached=False):
        print(f'ID = {tweet.id}, hashtag = {hashtags(tweet.text)}, text = {tweet.text}, author = {tweet.author}, date = {tweet.date}')
        if len(tweet.text) > 0 and tweet.id not in idindex:
            tweets.append(tweet.text)
            idindex.add(tweet.id)
        prev = tweet.id
print(f'Found {len(tweets)} tweets.')
print('')

前几行的输出

0
ID = 1411779493195182080, hashtag = ['#Siveen', '#AgniSiragugal'], text = RT @NaveenFilmmaker: I’m a proud father today. My daughter #Siveen completes dubbing for #AgniSiragugal. For a 5yr old kid she boldly performed in that harsh European winter and snow storm. Today she made me cry when she dubbed an emotional scene. சிவீன் தந்தை நவீன் என்று சொல்லிக்கொள்வதில் பெருமை!!, author = YathumOorey, date = Sun Jul 04 20:10:35 +0000 2021 
ID = 1411770366570106880, hashtag = ['#Siveen', '#AgniSiragugal'], text = RT @NaveenFilmmaker: I’m a proud father today. My daughter #Siveen completes dubbing for #AgniSiragugal. For a 5yr old kid she boldly performed in that harsh European winter and snow storm. Today she made me cry when she dubbed an emotional scene. சிவீன் தந்தை நவீன் என்று சொல்லிக்கொள்வதில் பெருமை!!, author = itzabiiz, date = Sun Jul 04 19:34:19 +0000 2021 
ID = 1411769785554063360, hashtag = ['#Siveen', '#AgniSiragugal'], text = RT @NaveenFilmmaker: I’m a proud father today. My daughter #Siveen completes dubbing for #AgniSiragugal. For a 5yr old kid she boldly performed in that harsh European winter and snow storm. Today she made me cry when she dubbed an emotional scene. சிவீன் தந்தை நவீன் என்று சொல்லிக்கொள்வதில் பெருமை!!, author = kalaimaniraj1, date = Sun Jul 04 19:32:00 +0000 2021
...

有没有办法将上面的输出存储到熊猫数据框中?我希望数据框包含以下内容:ID = {tweet.id}, hashtag = {hashtags(tweet.text)}, text = {tweet.text}, author = {tweet.author}, date = {tweet.date }。

标签: pythonpandas

解决方案


这应该完成这项工作:

import pandas as pd

engine = Twitter(language='en')
idindex = set()
tweets = []
df= pd.DataFrame()

prev = None

#create loop to capture 200 unique tweets for 'Winter snow storm'
for i in range(5):
    print(i)
    for tweet in engine.search('Winter snow storm', start=prev, count= 200, cached=False):
        print(f'ID = {tweet.id}, hashtag = {hashtags(tweet.text)}, text = {tweet.text}, author = {tweet.author}, date = {tweet.date}')

        df.append({'ID':tweet.id, 'hashtag':hashtags(tweet.text), 'text':tweet.text,'author' : tweet.author, 'date' : tweet.date})

        if len(tweet.text) > 0 and tweet.id not in idindex:
            tweets.append(tweet.text)
            idindex.add(tweet.id)
        prev = tweet.id
print(f'Found {len(tweets)} tweets.')
print('')

推荐阅读