xpath - Scrapy - 如何处理随机数量的元素？

问题描述

我有一个 Scrapy 爬虫，我可以轻松地获取所需的第一个段落，但有时会有第二个或第三个段落。

response.xpath(f"string(//h2[contains(text(), '{card}')]/following-sibling::p)").get() 是我用来获取所述段落的 xpath 代码。

response.xpath(f"string(//h2[contains(text(), '{card}')]/following-sibling::p[1])").get()获得相同的段落，但有时，我需要response.xpath(f"string(//h2[contains(text(), '{card}')]/following-sibling::p[2])").get().

抓取时如何考虑这些不同数量的段落？

标签： xpathweb-scrapingscrapy

您可以尝试使用通配符 *.

移除

编辑：使用 string() 函数，您只会得到第一段。

只需从 XPath 表达式中删除 string() 即可获取所有段落（假设在同一个节点中）并将结果存储在变量中。

//h2[contains(text(), '{card}')]/following-sibling::p/text()

替代方案：如果您知道最大可能的段落数，则可以使用 concat()。

concat(//h2[contains(text(), '{card}')]/following-sibling::p[1],'|',//h2[contains(text(), '{card}')]/following-sibling::p[2])

xpath - Scrapy - 如何处理随机数量的元素？

问题描述

解决方案

推荐阅读