首页 > 解决方案 > 如何在带有scrapy的元素中选择特定元素

问题描述

import scrapy

class rlgSpider(scrapy.Spider):
    name = 'bot'

    start_urls = [
    'https://rocket-league.com/trading?filterItem=0&filterCertification=0&filterPaint=0&filterPlatform=1&filterSearchType=1&filterItemType=0&p=1']

    def parse(self, response):
        data = {}
        offers = response.xpath('//div[@class = "col-3-3"]')
        for offer in offers:
            for item in offer.xpath('//div[@class = "rlg-trade-display-container is--user"]/div[@class = "rlg-trade-display-items"]/div[@class = "col-1-2 rlg-trade-display-items-container"]/a'):
                data['name'] = item.xpath('//div/div[@position ="relative"]/h2').extarct()
                yield data

这是我到目前为止所做的-效果不佳。它会刮掉 url 而不是 h2 标签,当它在这么多 div 中时,我该怎么做?

标签: pythonweb-scrapingscrapy

解决方案


为了解析scrapy中的元素,您需要以“。”开始您的xpath。否则您将解析响应,这是正确的方法。

def parse(self, response):

    offers = response.xpath('//div[@class = "col-3-3"]')
    for offer in offers:
        for item in offer.xpath('.//div[@class = "rlg-trade-display-container is--user"]/div[@class = "rlg-trade-display-items"]/div[@class = "col-1-2 rlg-trade-display-items-container"]/a'):
            data = {}
            data['name'] = item.xpath('.//h2/text()').extarct_first()
            yield data

推荐阅读