python - 从 csv 读取数据并将其动态添加到分页 url 时出现 Scrapy key 错误
问题描述
我写了一个爬虫,我有一个 URL
https://www.proptiger.com/ahmedabad/property-sale?projectStatus=launchingSoon,newLaunch&page=1
城市名称是ahmedabad但它是动态的 我正在从 CSV 读取城市。它适用于第一页,但我无法将城市添加到下一页
我在 start_urls 中将城市作为元数据发送,但它无法正常工作
csv 文件链接:sendanywhe.re/YMXGKU1A
下面是我的代码:
import scrapy
import csv
# from ..items import CommonfloorItem
class ABC(scrapy.Spider):
name = 'common'
allowed_domains = ['www.proptiger.com']
page_number = 2
# start_urls = ['https://www.proptiger.com/ahmedabad/property-sale?projectStatus=launchingSoon,newLaunch&page=1']
BASE_URL = "https://www.proptiger.com/"
def parse(self, response):
# items = CommonfloorItem()
city = response.request.meta["city"]
proj_name = response.xpath("//*[@data-type='cluster-link-project']")
for name in proj_name:
Project_Name = name.xpath(".//div[@class='proj-name']//text()").get()
Builder_Name = name.xpath(".//a[@data-type='cluster-link-builder']//text()").get()
locality = name.xpath(".//div[@class='proj-address put-ellipsis']//*[1][@itemprop='address']//text()").get()
city = name.xpath(".//div[@class='proj-address put-ellipsis']//*[3]//text()").get()
posssession = name.xpath(".//div[@class='possession-wrap']//span/text()").get()
rera= name.xpath(".//div[@class='rera-id put-ellipsis']//text()").get()
link = ABC.BASE_URL+ name.xpath(".//div[@class='proj-name']//@href").get()
yield {"ProjectName":Project_Name,"link":link,"BuilerName":Builder_Name,"locality":locality,"city":city.replace(u'\xa0', u' ').replace(",","").strip(),"posssession":posssession,"rera":rera}
***#here i am facing problem***
next_page = f'https://www.proptiger.com/{city.replace(",","").strip()}/property-sale?projectStatus=launchingSoon,newLaunch&page={self.page_number}'
print("url>>>>>>>>>>",next_page)
if (not len(proj_name) == 0):
self.page_number += 1
yield response.follow(next_page,callback=self.parse)
def start_requests(self):
# url = "https://www.proptiger.com/ahmedabad/property-sale?projectStatus=launchingSoon,newLaunch&page=1"
with open("./output.csv","r")as f:
for line in f:
yield scrapy.Request(url=f"https://www.proptiger.com/{line.strip()}/property-sale?projectStatus=launchingSoon,newLaunch&page=1",callback=self.parse,headers={'User-Agent':"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"},meta={"city":line.strip()})
当我运行这段代码时,我也收到了一个错误:
Traceback (most recent call last):
File "/home/prince/.local/lib/python3.6/site-packages/scrapy/utils/defer.py", line 120, in iter_errback
yield next(it)
File "/home/prince/.local/lib/python3.6/site-packages/scrapy/utils/python.py", line 353, in __next__
return next(self.data)
File "/home/prince/.local/lib/python3.6/site-packages/scrapy/utils/python.py", line 353, in __next__
return next(self.data)
File "/home/prince/.local/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
for r in iterable:
File "/home/prince/.local/lib/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
for x in result:
File "/home/prince/.local/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
for r in iterable:
File "/home/prince/.local/lib/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 342, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/home/prince/.local/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
for r in iterable:
File "/home/prince/.local/lib/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 40, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/prince/.local/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
for r in iterable:
File "/home/prince/.local/lib/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/prince/.local/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
for r in iterable:
File "/media/prince/New Volume/Projects/Python-Projects/Competition/commonfloor/commonfloor/spiders/common.py", line 16, in parse
city = response.request.meta["city"]
KeyError: 'city'
解决方案
推荐阅读
- python - 将列从一个数据帧附加到另一个数据帧,循环中有多个匹配项
- c++ - 使用 Visual Studio 调试 C++ 默认生成的 == 运算符
- arrays - 从react js中的数组数据获取Img标签时如何设置图像路径?
- sockets - 如何使用通用 netlink 和 libnl 从内核向用户空间发送消息?
- r - 将坐标转换为国会选区
- ssh - Paramiko,打开一个 SOCKS 代理,但从机器 B 到 C
- android - 如何测量 Jetpack Compose 中的渲染时间?
- javascript - 如何在不使用任何插件的情况下为动态创建的 HTML 表格添加正确的分页
- npm - Vuex ORM:任何作为 vuex orm 商店的经验或示例 npm 包?
- javascript - 状态未更新组件 (React)