首页 > 解决方案 > 我的错误在哪里-异地请求-scrapy

问题描述

我正在尝试解决异地请求错误,我的错误在哪里?

# -*- coding: utf-8 -*-
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule

class BestMoviesSpider(CrawlSpider):
    name = 'best_movies'
    allowed_domains = ['imbd.com']
    start_urls = ['https://www.imdb.com/search/title/?groups=top_250&sort=user_rating']

    rules = (
        Rule(LinkExtractor(restrict_xpaths='//h3[@class="lister-item-header"]/a'), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        print(response.url)
                

在 VSCode 中向我展示:

[scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.imdb.com': <GET https://www.imdb.com/title/tt0111161/>

而不是链接列表。

标签: scrapy

解决方案


错误的域,你有:

allowed_domains = ['imbd.com']

但应该像网站域:

allowed_domains = ['imdb.com']

您有 im bd,需要更改为 im db


推荐阅读