首页 > 解决方案 > 为什么我的scrapy总是在Scrapinghub中告诉我“TCP连接超时”但在我的本地机器上工作正常

问题描述

我在 app.scrapinghub.com 中遇到以下错误,但在我的本地机器上工作正常。

注意:我在 python(Scrapy 框架)中使用 requests 模块发送请求并使用 BeautifulSoup 解析响应

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1297, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/local/lib/python3.6/site-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/local/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.internet.error.TCPTimedOutError: TCP connection timed out: 110: Connection timed out.

示例代码:

from scrapy.spider import Spider
import requests,json
from bs4 import BeautifulSoup
from datetime import datetime
from datetime import timedelta 
from Scrapy_Project.pipelines import MySQLPipeline

class exampleSpider(Spider):
name='test'
start_urls=['http://www.example.com']
custom_settings = {
    'ITEM_PIPELINES': {
        'Scrapy_Project.pipelines.MySQLPipeline': None
}
}

def parse(self, response):
    current_date=datetime.today()
    today=current_date.strftime('%m/%d/%Y')

    tommrrow_date= current_date+timedelta(days=1)
    tommrrow=tommrrow_date.strftime('%m/%d/%Y')

    date_list=[today,tommrrow]
    data_lists=[]
    for date in date_list:
        m_show_time=[]
        url='http://www.example.com?id=123'
        page = requests.get(url) ///
        soup = BeautifulSoup(page.content, 'html.parser')
        movie_list=soup.find_all('exampe_info')
        for index, item in enumerate(data_lists):
            name=item.find('title').get_text()
            x_time=item.find('starttime').get_text()
            result= {'name': name ,'show_date':date}
            yield result

标签: pythonpython-3.xweb-scrapingscrapypython-requests

解决方案


推荐阅读