首页 > 解决方案 > scrapy.Request 无法下载页面

问题描述

描述

我的查询是:更广泛、更深入、更便宜、更快:用于序列学习的张量 LSTM。

get_url 函数返回此请求 url:http://api.scraperapi.com/?api_key=apikey&url=https%3A%2F%2Fscholar.google.com%2Fscholar%3Fhl%3Den%26q%3DWider%2Band%2BDeeper%252C%2BCheaper%2Band%2BFaster%253A%2BTensorized%2BLSTMs%2Bfor%2BSequence%2BLearning.&country_code=us

但是,该软件不执行我的回调函数 self.parse。对于其他查询,它会执行回调函数。

我的代码

def get_url(url):
    payload = {'api_key': API_KEY, 'url': url, 'country_code': 'us'}
    proxy_url = 'http://api.scraperapi.com/?' + urlencode(payload)
    print("request url:", proxy_url)
    return proxy_url

class ExampleSpider(scrapy.Spider):
    name = 'scholar'
    allowed_domains = ['api.scraperapi.com']

    def start_requests(self):
        queries = ["Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning.", "Decoding with Value Networks for Neural Machine Translation."]
        for query in queries:
            print("current query is:")
            print(query)
            self.query = query
            url = 'https://scholar.google.com/scholar?' \
                  + urlencode({'hl': 'en', 'q': self.query})
            yield scrapy.Request(get_url(url), callback=self.parse,
                                 meta={'position': 0})

版本

Scrapy       : 2.4.1
lxml         : 4.6.1.0
libxml2      : 2.9.10
cssselect    : 1.1.0
parsel       : 1.6.0
w3lib        : 1.22.0
Twisted      : 20.3.0
Python       : 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) - [GCC 7.3.0]
pyOpenSSL    : 19.1.0 (OpenSSL 1.1.1g  21 Apr 2020)
cryptography : 2.9.2
Platform     : Linux-5.4.0-53-generic-x86_64-with-debian-bullseye-sid

标签: scrapy

解决方案


推荐阅读