首页 > 解决方案 > 从 csv 读取数据并将其动态添加到分页 url 时出现 Scrapy key 错误

问题描述

我写了一个爬虫,我有一个 URL

https://www.proptiger.com/ahmedabad/property-sale?projectStatus=launchingSoon,newLaunch&page=1

城市名称是ahmedab​​ad但它是动态的 我正在从 CSV 读取城市。它适用于第一页,但我无法将城市添加到下一页

我在 start_urls 中将城市作为元数据发送,但它无法正常工作

csv 文件链接:sendanywhe.re/YMXGKU1A

下面是我的代码:

import scrapy
import csv

# from ..items import CommonfloorItem

class ABC(scrapy.Spider):
    name = 'common'
    allowed_domains = ['www.proptiger.com']
    page_number = 2
    # start_urls = ['https://www.proptiger.com/ahmedabad/property-sale?projectStatus=launchingSoon,newLaunch&page=1']
    BASE_URL = "https://www.proptiger.com/"

    def parse(self, response):
        # items = CommonfloorItem()

        city = response.request.meta["city"]
        proj_name = response.xpath("//*[@data-type='cluster-link-project']")
        
        for name in proj_name:
            Project_Name = name.xpath(".//div[@class='proj-name']//text()").get()
            Builder_Name = name.xpath(".//a[@data-type='cluster-link-builder']//text()").get()
            locality = name.xpath(".//div[@class='proj-address put-ellipsis']//*[1][@itemprop='address']//text()").get()
            city = name.xpath(".//div[@class='proj-address put-ellipsis']//*[3]//text()").get()
            posssession = name.xpath(".//div[@class='possession-wrap']//span/text()").get()
            rera=  name.xpath(".//div[@class='rera-id put-ellipsis']//text()").get()
            link = ABC.BASE_URL+ name.xpath(".//div[@class='proj-name']//@href").get()
            yield {"ProjectName":Project_Name,"link":link,"BuilerName":Builder_Name,"locality":locality,"city":city.replace(u'\xa0', u' ').replace(",","").strip(),"posssession":posssession,"rera":rera}

        
        ***#here i am facing problem***
        next_page = f'https://www.proptiger.com/{city.replace(",","").strip()}/property-sale?projectStatus=launchingSoon,newLaunch&page={self.page_number}'
        print("url>>>>>>>>>>",next_page)

        if (not len(proj_name) == 0):
            self.page_number += 1
            yield response.follow(next_page,callback=self.parse)
            
        
                    
    def start_requests(self):
        # url = "https://www.proptiger.com/ahmedabad/property-sale?projectStatus=launchingSoon,newLaunch&page=1"
        with open("./output.csv","r")as f:
            for line in f:
                yield scrapy.Request(url=f"https://www.proptiger.com/{line.strip()}/property-sale?projectStatus=launchingSoon,newLaunch&page=1",callback=self.parse,headers={'User-Agent':"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"},meta={"city":line.strip()})
        

当我运行这段代码时,我也收到了一个错误:

Traceback (most recent call last):
  File "/home/prince/.local/lib/python3.6/site-packages/scrapy/utils/defer.py", line 120, in iter_errback
    yield next(it)
  File "/home/prince/.local/lib/python3.6/site-packages/scrapy/utils/python.py", line 353, in __next__
    return next(self.data)
  File "/home/prince/.local/lib/python3.6/site-packages/scrapy/utils/python.py", line 353, in __next__
    return next(self.data)
  File "/home/prince/.local/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/home/prince/.local/lib/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/home/prince/.local/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/home/prince/.local/lib/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 342, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/home/prince/.local/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/home/prince/.local/lib/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 40, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/prince/.local/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/home/prince/.local/lib/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/prince/.local/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/media/prince/New Volume/Projects/Python-Projects/Competition/commonfloor/commonfloor/spiders/common.py", line 16, in parse
    city = response.request.meta["city"]
KeyError: 'city'

标签: pythonpython-3.xbeautifulsouppython-requestsscrapy

解决方案


推荐阅读