首页 > 解决方案 > scrapy 返回 response.status 505

问题描述

尝试打开站点时的scrapy返回response.status 505

505 HTTP Version Not Supported

同一站点在浏览器中正常打开。为什么会这样?如何解决这个问题?

我通过这个命令行在控制台中调用scrapy:

scrapy shell 'https://xiaohua.zol.com.cn/detail60/59411.html'

标签: web-scrapingscrapy

解决方案


您应该使用正确的标题来提取数据。这是一个带输出的演示

import scrapy
from scrapy.crawler import CrawlerProcess
import json

class Xiaohua(scrapy.Spider):
    name = 'xiaohua'
    start_urls = 'https://xiaohua.zol.com.cn/detail60/59411.html'


    def start_requests(self):
        headers = {
        'authority': 'xiaohua.zol.com.cn',
        'cache-control': 'max-age=0',
        'sec-ch-ua': '"Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"Linux"',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'cross-site',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'accept-language': 'en-US,en;q=0.9',
        'cookie': 'z_pro_city=s_provice%3Dmengjiala%26s_city%3Dnull; userProvinceId=1; userCityId=0; userCountyId=0; userLocationId=1; ip_ck=7sWD7/jzj7QuOTIyODI0LjE2MzQxMTQxNzg%3D; lv=1634114179; vn=1; Hm_lvt_ae5edc2bc4fc71370807f6187f0a2dd0=1634114179; _ga=GA1.3.116086394.1634114186; _gid=GA1.3.2021660129.1634114186; Hm_lpvt_ae5edc2bc4fc71370807f6187f0a2dd0=1634114447; questionnaire_pv=1634083202; z_day=ixgo20%3D1%26icnmo11564%3D1; 22aa20c0da0b6f1d9a3155e8bf4c364e=cq11lgg54n27u10p%7B%7BZ%7D%7D%7B%7BZ%7D%7Dnull; MyZClick_22aa20c0da0b6f1d9a3155e8bf4c364e=/html/body/div%5B5%5D/div/div/div%5B2%5D/p/a/',
       
            }
        yield scrapy.Request(url= self.start_urls , callback=self.parse, headers=headers)

    def parse(self, response):
        print(response.status)
        print('*'*10)
        print(response.css('h1.article-title::text').get()) 
        print(response.css('ul.nav > li > a::text').getall())   
        print('*'*10)   
process = CrawlerProcess()
process.crawl(Xiaohua)
process.start()

输出

200
**********
导演你能认真点儿吗
['笑话首页', '最新笑话', '冷笑话', '搞笑趣图', '搞笑视频', '上传笑话']
**********

推荐阅读