首页 > 解决方案 > 与循环内的“点”一起使用时,相对 xpath 不起作用

问题描述

我对 Python 和 Scrapy 还很陌生。所以我创建了一个蜘蛛,但我遇到了相对路径的问题。如果我不在循环内使用“点”,只要循环运行,它就会打印相同的结果,但如果我在循环内使用“点”,则表明它已被刮掉,但文本为空白。

import scrapy
from demo_proj.items import JokeItem
from scrapy.loader import ItemLoader
from scrapy import Selector


class JokesSpider(scrapy.Spider):
    name = 'jokes'
    allowed_domains=['kitco.com']
    start_urls = [
        'https://www.kitco.com/'
    ]


    def parse(self, response):
        for joke in response.xpath("//div[@class='top15']"):
            l=ItemLoader(item=JokeItem(),selector=joke)
            l.add_xpath('news',".//div[@class='top15']/a/h3")
            l.add_xpath('time',".//div[@class='top15']/span[@class='post-date']")
            l.add_xpath('source',".//div[@class='top15']/span[@class='source']")
            yield l.load_item()

标签: pythonxpathscrapy

解决方案


//div[@class='top15']谓词在您的 for 循环中是额外的。在进入 for 循环之前,您将其缩小到它。蜘蛛将是:

class JokesSpider(scrapy.Spider):
    name = 'jokes'
    allowed_domains=['kitco.com']
    start_urls = [
        'https://www.kitco.com/'
    ]

    def parse(self, response):
        for joke in response.xpath("//div[@class='top15']"):
            l = ItemLoader(item=JokeItem(), selector=joke)
            l.add_xpath('news', "./a/h3/text()")
            l.add_xpath('time', "./span[@class='post-date']/text()")
            l.add_xpath('source', "./span[@class='source']/text()")
            yield l.load_item()

items.py是:

class JokeItem(scrapy.Item):
    news = scrapy.Field()
    time = scrapy.Field()
    source = scrapy.Field()

这是我日志的几行:

{'news': ['The real gold price rally hasn’t even started yet, says analyst who '
          '...'],
 'source': ['Kitco Video News'],
 'time': ['Dec  9']}
2019-12-10 10:08:20 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.kitco.com/>
{'news': ['Who will win the 2020 presidential election? Doug Casey weighs in '
          'on ...'],
 'source': ['Kitco News'],
 'time': ['Dec  9']}
2019-12-10 10:08:20 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.kitco.com/>
{'news': ['What kind of a gold investor are you?'],
 'source': ['Kitco News'],
 'time': ['Dec  9']}

推荐阅读