首页 > 解决方案 > 计算一年中包含特定单词的推文的频率

问题描述

我试图计算一年中单个单词的推文数量,同时写下每天及其推文数量并存储,而不是将其存储在带有“日期”和“频率”的 CSV 文件中。这是我的代码,但运行一段时间后我不断收到错误消息。

import pandas as pd
import twint
import nest_asyncio
from datetime import datetime,timedelta


bugun = '2020-01-01'
yarin = '2020-01-02'

df = pd.DataFrame(columns=("Data","Frequency")) 




for i in range(365):
    
    file = open("Test.csv","w")
    file.close()
    
    bugun = (datetime.strptime(bugun, '%Y-%m-%d') + timedelta(days=1)).strftime('%Y-%m-%d')

    yarin =(datetime.strptime(yarin, '%Y-%m-%d') + timedelta(days=1)).strftime('%Y-%m-%d')

    nest_asyncio.apply()
    
    c = twint.Config()
    c.Search = "Chainlink"

    #c.Hide_output=True
    c.Since= bugun
    c.Until= yarin

    c.Store_csv = True
    c.Output = "Test.csv"
    c.Count = True 

    twint.run.Search(c)


    data = pd.read_csv("Test.csv")
    frequency = str(len(data))
    
    #d = {"Data": [bugun], "Frequency": [frequency]}

    #d_f = pd.DataFrame(data=d)
    
    #df = df.append(d_f, ignore_index=True)
    

    df.loc[i] = [bugun] + [frequency]
    df.to_csv (r'C:\Users\serap\Desktop\CRYPTO 100\Chainlink.csv',index = False, header=False)

我得到的错误是这个

  File "C:\Users\serap\Desktop\CRYPTO 100\CODES\Binance_Coin\Binance Coin.py", line 47, in <module>
    data = pd.read_csv("Test.csv")

  File "C:\Users\serap\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 605, in read_csv
    return _read(filepath_or_buffer, kwds)

  File "C:\Users\serap\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 457, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)

  File "C:\Users\serap\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 814, in __init__
    self._engine = self._make_engine(self.engine)

  File "C:\Users\serap\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 1045, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]

  File "C:\Users\serap\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 1893, in __init__
    self._reader = parsers.TextReader(self.handles.handle, **kwds)

  File "pandas\_libs\parsers.pyx", line 521, in pandas._libs.parsers.TextReader.__cinit__

EmptyDataError: No columns to parse from file

感谢您的帮助 :)

标签: pythonpandasdataframetwint

解决方案


阅读教程后如何使用 Python Twint 从 Twitter 中抓取推文 | 通过 Andika Pratama | 分析 Vidhya | Medium,我认为你最好让Twint进行迭代:

c = twint.Config()
c.Search = "Chainlink"
c.Since = "2020–01–01"
c.Until = "2021–01–01"
c.Store_csv = True
c.Output = "Test.csv"
c.Count = True 
twint.run.Search(c)

现在您可以遍历 CSV 输出:

data = pd.read_csv("Test.csv")
# ...

到目前为止,我还没有找到有关 CSV 输出的详细信息,但是 twint 源代码(master/twint/storage/write.py(第 58 行ff)告诉我们,对于 CSV,如果文件已经存在,则附加输出。因此,您可能必须先截断它或删除现有文件。一个有效的选择可能是

open(`Test.csv`, 'w').close()

...这与您所做的基本相同,但没有引入另一个变量。


推荐阅读