apache-kafka - Scrapy-cluster 回调请求不起作用,卡在处理元直通中间件
问题描述
这是 kibana 的这个调试面板,我正在尝试使用 scrapy-cluster,但这在回调请求中不起作用。这在scrapy中工作正常,但在scrapy-cluster中不起作用。无法抓取卡在处理元直通中间件中的数据
class EbayDataSpider(RedisSpider):
name = 'ebay_data'
# Allow a custom parameter (-a flag in the scrapy command)
def __init__(self, search="iphone 64GB", *args, **kwargs):
self.search_string = search
super(EbayDataSpider, self).__init__(*args, **kwargs)
def parse(self, response):
# Extrach the trksid to build a search request
trksid = response.css("input[type='hidden'][name='_trksid']").xpath(
"@value").extract()[0]
# Build the url and start the requests
yield response.follow(url="http://www.ebay.com/sch/i.html?_from=R40&_trksid=" + trksid +
"&_nkw=" +
self.search_string.replace(
' ', '+') + "&_sacat=0",
callback=self.parse_link)
# Parse the search results
def parse_link(self, response):
# Extract the list of products
results = response.xpath(
'//div/div/ul/li[contains(@class, "s-item" )]')
# Extract info for each product
for product in results:
product_url = product.xpath(
'.//a[@class="s-item__link"]/@href').extract_first()
yield response.follow(url=product_url, callback=self.parse_product_details)
def parse_product_deails(self, response):
# capture raw response
item = RawResponseItem()
# populated from response.meta
item['appid'] = response.meta['appid']
item['crawlid'] = response.meta['crawlid']
item['attrs'] = response.meta['attrs']
# populated from raw HTTP response
item["url"] = response.request.url
item["response_url"] = response.url
item["status_code"] = response.status
item["status_msg"] = "OK"
item["response_headers"] = self.reconstruct_headers(response)
item["request_headers"] = response.request.headers
#item["body"] = response.body
item["body"] = "This is empty body from amazon spider"
item["links"] = []
# Add more data from details page
item['p_brand'] = response.xpath(
"//div[@id='viTabs_0_is']//tbody//tr[1]//td[4]//span/text()").extract()
item['p_title'] = response.xpath("//h1[@id='itemTitle']/text()").extract()
item['p_price'] = response.xpath("//span[@id='prcIsum']/text()").extract()
yield item
解决方案
推荐阅读
- scala - 证明 runtimeClass 满足 Scala 中的 Bound 类型
- docker - Dockerfile:COPY 不起作用(即使他可以找到该文件)
- javascript - 如何开始在 Formio.js 中构建自定义组件?文档在哪里?
- visual-studio - PHP CS Fixer 不会在 Visual Studio Code 中加载 .php-cs 文件
- android - “未解决的参考:数据绑定”
- ruby-on-rails - 如何在 rake 任务中使用 postgres 复制功能?
- wagtail - Wagtail:过滤 PageChooserPanel 中的可用页面
- mongodb - MongoDB使用另一个字段的值添加新字段
- twilio - Twilio Studio - 发送并等待回复 - 多个
- python - 如何编辑 x 轴长度同时保持绘图日期?