首页 > 解决方案 > Cloudfare 中间件递归绕过保护

问题描述

我已经安装了cfscrape,并在我的scrapy项目中使用它来绕过cloudfare保护。它似乎不起作用。如果需要,我可以添加一些关于我的代码的更多信息。我提供了一些来自settings.py的代码:`

DOWNLOAD_DELAY = 0.25

DEFAULT_REQUEST_HEADERS = {
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Language': 'en',
}
USER_AGENT_LIST = [
    'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like asdlkjqwdj) Chrome/16.0.912.36 Safari/535.7',
    'Mozilla/5.0 (Windows NT 2.1; WOW64) AppleWebKit/535.7 (KHTML, like kqjwdqwd) Chrome/16.0.912.36 Safari/123123.123',
    'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like asdqwdqw) Chrome/16.0.912.36 Safari/535.7'
]

DOWNLOADER_MIDDLEWARES = {
    # The priority of 560 is important, because we want this middleware to kick in just before the scrapy built-in `RetryMiddleware`.
     'scrapy_cloudflare_middleware.middlewares.CloudFlareMiddleware': 560,
     'binaaz.middlewares.RandomUserAgentMiddleware': 400,
     'binaaz.middlewares.ProxyMiddleware': 410,
     'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None
    # Disable compression middleware, so the actual HTML pages are cached
}

这是输出:

2019-01-18 15:02:59 [scrapy.core.engine] INFO: Spider opened
2019-01-18 15:02:59 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-01-18 15:02:59 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2019-01-18 15:03:00 [cloudflaremiddleware] DEBUG: Cloudflare protection detected on https://bina.az/items/all, trying to bypass...
2019-01-18 15:03:00 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): bina.az
2019-01-18 15:03:00 [urllib3.connectionpool] DEBUG: https://bina.az:443 "GET /items/all HTTP/1.1" 503 None
2019-01-18 15:03:08 [urllib3.connectionpool] DEBUG: Resetting dropped connection: bina.az
2019-01-18 15:03:09 [urllib3.connectionpool] DEBUG: https://bina.az:443 "GET /cdn-cgi/l/chk_jschl?jschl_vc=6d688f88cb5562fd00a5ee16689e26c9&pass=1547809384.962-1OiFgM1jrn&jschl_answer=18.089701993600002 HTTP/1.1" 302 159
2019-01-18 15:03:09 [urllib3.connectionpool] DEBUG: https://bina.az:443 "GET /items/all HTTP/1.1" 503 None
2019-01-18 15:03:17 [urllib3.connectionpool] DEBUG: Resetting dropped connection: bina.az
2019-01-18 15:03:17 [urllib3.connectionpool] DEBUG: https://bina.az:443 "GET /cdn-cgi/l/chk_jschl?jschl_vc=e719be7793db058c78b89dc24f6e39b7&pass=1547809393.639-gfrUEjO5oa&jschl_answer=26.1093884035 HTTP/1.1" 302 159
2019-01-18 15:03:17 [urllib3.connectionpool] DEBUG: https://bina.az:443 "GET /items/all HTTP/1.1" 204 0
2019-01-18 15:03:17 [cloudflaremiddleware] DEBUG: Successfully bypassed the protection for https://bina.az/items/all, re-scheduling the request
2019-01-18 15:03:18 [cloudflaremiddleware] DEBUG: Cloudflare protection detected on https://bina.az/items/all, trying to bypass...
2019-01-18 15:03:18 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): bina.az
2019-01-18 15:03:18 [urllib3.connectionpool] DEBUG: https://bina.az:443 "GET /items/all HTTP/1.1" 503 None
2019-01-18 15:03:26 [urllib3.connectionpool] DEBUG: Resetting dropped connection: bina.az
2019-01-18 15:03:27 [urllib3.connectionpool] DEBUG: https://bina.az:443 "GET /cdn-cgi/l/chk_jschl?jschl_vc=faf6647605a0b45ed9958c055de11334&pass=1547809403.2-RdtkjUMCBv&jschl_answer=35.723157815600004 HTTP/1.1" 302 159
2019-01-18 15:03:27 [urllib3.connectionpool] DEBUG: https://bina.az:443 "GET /items/all HTTP/1.1" 503 None
2019-01-18 15:03:35 [urllib3.connectionpool] DEBUG: Resetting dropped connection: bina.az
2019-01-18 15:03:35 [urllib3.connectionpool] DEBUG: https://bina.az:443 "GET /cdn-cgi/l/chk_jschl?jschl_vc=8a0665b0b9df3707f15de0299d196b33&pass=1547809411.873-xrZecnfmeE&jschl_answer=-3.7878104051 HTTP/1.1" 302 159
2019-01-18 15:03:36 [urllib3.connectionpool] DEBUG: https://bina.az:443 "GET /items/all HTTP/1.1" 204 0
2019-01-18 15:03:36 [cloudflaremiddleware] DEBUG: Successfully bypassed the protection for https://bina.az/items/all, re-scheduling the request
2019-01-18 15:03:36 [cloudflaremiddleware] DEBUG: Cloudflare protection detected on https://bina.az/items/all, trying to bypass...
2019-01-18 15:03:36 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): bina.az
2019-01-18 15:03:36 [urllib3.connectionpool] DEBUG: https://bina.az:443 "GET /items/all HTTP/1.1" 503 None
2019-01-18 15:03:45 [urllib3.connectionpool] DEBUG: Resetting dropped connection: bina.az
2019-01-18 15:03:45 [urllib3.connectionpool] DEBUG: https://bina.az:443 "GET /cdn-cgi/l/chk_jschl?jschl_vc=671d8f3bfcb227960607a55f33183748&pass=1547809421.422-h5xJraFbRf&jschl_answer=20.2767620475 HTTP/1.1" 302 159
2019-01-18 15:03:45 [urllib3.connectionpool] DEBUG: https://bina.az:443 "GET /items/all HTTP/1.1" 204 0
2019-01-18 15:03:45 [cloudflaremiddleware] DEBUG: Successfully bypassed the protection for https://bina.az/items/all, re-scheduling the request

PS:我已将 cfscrape 更新到最新版本。

标签: python-3.xscrapy

解决方案


推荐阅读