首页 > 解决方案 > scrapy splash:连接被对方​​拒绝:61:连接被拒绝

问题描述

我一直在尝试使用 splash 运行 scrapy 以提取 javascript 呈现的数据。Splash 通过以下命令启动并运行:

docker run -d -p 8050:8050 scrapinghub/splash --max-timeout 600

飞溅出现在“http://127.0.0.1:8050”和“http://localhost:8050”上。

我的主机文件如下所示:

##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##
127.0.0.1   localhost
127.0.0.1       feedworker.host
255.255.255.255 broadcasthost
::1             localhost
# Added by Docker Desktop
# To allow the same kube context to work on the host and the container:
127.0.0.1 kubernetes.docker.internal
# End of section

但是当我爬行做“

scrapy crawl spider_name

“我每次都明白:

2021-09-03 14:03:26 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://ted.europa.eu/TED/browse/browseByPD.do via http://localhost:8050/execute> (failed 3 times): Connection was refused by other side: 61: Connection refused.
2021-09-03 14:03:26 [scrapy.core.scraper] ERROR: Error downloading <GET https://ted.europa.eu/TED/browse/browseByPD.do via http://localhost:8050/execute>
Traceback (most recent call last):
  File "/Users/sudipadh/Desktop/upwork/scrapy-rabbit/venv/lib/python3.8/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
    return (yield download_func(request=request, spider=spider))
twisted.internet.error.ConnectionRefusedError: Connection was refused by other side: 61: Connection refused.

飞溅的scrapy设置:

DOWNLOADER_MIDDLEWARES = {
    'proactis.tor.middleware.TorMiddleware': 100,
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}

SPLASH_URL = 'http://localhost:8050/'

任何帮助,将不胜感激:

标签: pythondockerweb-scrapingscrapy-splash

解决方案


推荐阅读