首页 > 解决方案 > 使用python使用scrapy splash同时抓取多个不同的url

问题描述

我需要同时使用scrapy和splash来抓取多个网址。我尝试编写以下代码,但仍然没有运气。
我已经附上了网址。。这里..
'https: //wunderground.com/forecast/us/ny/布鲁克林 /','https://www.wunderground.com/forecast/us/pa/california/','https://www.wunderground.com/forecast/us/ny/boston
'

所以我需要遍历这些 URL,然后使用 scrapy 抓取它。
我无法使用多 url 获取数据。它显示错误。请帮助
我的问题是如何进一步抓取此 URL 列表?

import scrapy
from scrapy_splash import SplashRequest
import scrapy_proxies

class WundergroundSpider(scrapy.Spider):
    name = 'wunderground'
    #allowed_domains = ['www.wunderground.com/forecast/us/ny/brooklyn']
    start_urls = []

    script = '''
    function main(splash, args)
        splash.private_mode_enabled = false
        assert(splash:go(args.url))
        assert(splash:wait(10))
        return splash:html()
    end
    '''
    
    def start_requests(self):
        urls = [
        'https://wunderground.com/forecast/us/ny/brooklyn/',
        'https://www.wunderground.com/forecast/us/pa/california/',
        'https://www.wunderground.com/forecast/us/ny/boston'
        ]
        for url in urls:
            yield SplashRequest(url, self.parse,  args={'wait': 8})

    def parse(self, response):
        tmps= {
            'tempHigh': response.xpath("//div[@class='forecast']/a[@class='navigate-to ng-star-inserted']/div[@class='obs-forecast']/span/span[@class='temp-hi']/text()")[0],
            'templow': response.xpath("//div[@class='forecast']/a[@class='navigate-to ng-star-inserted']/div[@class='obs-forecast']/span/span[@class='temp-lo']/text()")[0],
            'obsphs' : response.xpath("//div[@class='forecast']/a[@class='navigate-to ng-star-inserted']/div[@class='obs-forecast']/div[@class='obs-phrase']/text()")[0]
            }
        yield tmps
    

标签: pythonweb-scrapingscrapyscrapy-splash

解决方案


推荐阅读