首页 > 解决方案 > Twint 抓取:ClientPayloadError:响应负载未完成

问题描述

当我使用 抓取有关某个主题标签的推文twint时,我收到了以下错误。谁能解释我为什么会发生此错误以及如何解决此问题?但是,由于某个日期的推文已被抓取,因此该日期之前的其他推文由于此错误而无法被抓取。

谢谢你的帮助!

---------------------------------------------------------------------------
ClientPayloadError                        Traceback (most recent call last)
<ipython-input-8-f28f8e9aab1e> in <module>
----> 1 twint.run.Search(c)

~/.local/lib/python3.8/site-packages/twint/run.py in Search(config, callback)
    408     config.Followers = False
    409     config.Profile = False
--> 410     run(config, callback)
    411     if config.Pandas_au:
    412         storage.panda._autoget("tweet")

~/.local/lib/python3.8/site-packages/twint/run.py in run(config, callback)
    327         raise
    328 
--> 329     get_event_loop().run_until_complete(Twint(config).main(callback))
    330 
    331 

~/opt/anaconda3/lib/python3.8/asyncio/base_events.py in run_until_complete(self, future)
    614             raise RuntimeError('Event loop stopped before Future completed.')
    615 
--> 616         return future.result()
    617 
    618     def stop(self):

~/.local/lib/python3.8/site-packages/twint/run.py in main(self, callback)
    233             task.add_done_callback(callback)
    234 
--> 235         await task
    236 
    237     async def run(self):

~/.local/lib/python3.8/site-packages/twint/run.py in run(self)
    284                     elif self.config.TwitterSearch:
    285                         logme.debug(__name__ + ':Twint:main:twitter-search')
--> 286                         await self.tweets()
    287                 else:
    288                     logme.debug(__name__ + ':Twint:main:no-more-tweets')

~/.local/lib/python3.8/site-packages/twint/run.py in tweets(self)
    215 
    216     async def tweets(self):
--> 217         await self.Feed()
    218         # TODO : need to take care of this later
    219         if self.config.Location:

~/.local/lib/python3.8/site-packages/twint/run.py in Feed(self)
     60             # this will receive a JSON string, parse it into a `dict` and do the required stuff
     61             try:
---> 62                 response = await get.RequestUrl(self.config, self.init)
     63             except TokenExpiryException as e:
     64                 logme.debug(__name__ + 'Twint:Feed:' + str(e))

~/.local/lib/python3.8/site-packages/twint/get.py in RequestUrl(config, init)
    133         _serialQuery = _url
    134 
--> 135     response = await Request(_url, params=params, connector=_connector, headers=_headers)
    136 
    137     if config.Debug:

~/.local/lib/python3.8/site-packages/twint/get.py in Request(_url, connector, params, headers)
    159     logme.debug(__name__ + ':Request:Connector')
    160     async with aiohttp.ClientSession(connector=connector, headers=headers) as session:
--> 161         return await Response(session, _url, params)
    162 
    163 

~/.local/lib/python3.8/site-packages/twint/get.py in Response(session, _url, params)
    166     with timeout(120):
    167         async with session.get(_url, ssl=True, params=params, proxy=httpproxy) as response:
--> 168             resp = await response.text()
    169             if response.status == 429:  # 429 implies Too many requests i.e. Rate Limit Exceeded
    170                 raise TokenExpiryException(loads(resp)['errors'][0]['message'])

~/opt/anaconda3/lib/python3.8/site-packages/aiohttp/client_reqrep.py in text(self, encoding, errors)
   1074         """Read response payload and decode."""
   1075         if self._body is None:
-> 1076             await self.read()
   1077 
   1078         if encoding is None:

~/opt/anaconda3/lib/python3.8/site-packages/aiohttp/client_reqrep.py in read(self)
   1030         if self._body is None:
   1031             try:
-> 1032                 self._body = await self.content.read()
   1033                 for trace in self._traces:
   1034                     await trace.send_response_chunk_received(

~/opt/anaconda3/lib/python3.8/site-packages/aiohttp/streams.py in read(self, n)
    368             blocks = []
    369             while True:
--> 370                 block = await self.readany()
    371                 if not block:
    372                     break

~/opt/anaconda3/lib/python3.8/site-packages/aiohttp/streams.py in readany(self)
    390         # without feeding any data
    391         while not self._buffer and not self._eof:
--> 392             await self._wait("readany")
    393 
    394         return self._read_nowait(-1)

~/opt/anaconda3/lib/python3.8/site-packages/aiohttp/streams.py in _wait(self, func_name)
    304             if self._timer:
    305                 with self._timer:
--> 306                     await waiter
    307             else:
    308                 await waiter

ClientPayloadError: Response payload is not completed

标签: pythonweb-scrapingtwittertwint

解决方案


不确定您是否找到了对您的问题的回应,但我想我会在此处为将来寻找的任何人添加此内容:

https://github.com/twintproject/twint/issues/1099

本质上,上面的链接建议使用 try / except 块来捕获错误并重试,如果这适用于您的代码。

我还发现 twint 使用 python3.6 效果更好 - 可能会有所帮助!

祝你好运 :)


推荐阅读