python - Scrapy Xpath 不包含命令不起作用

问题描述

我在我的 xpath 中使用了 not contains，但它似乎不起作用，因为它仍然在我的 not contains 中获取标题为“我不喜欢公司的内容”的 h2 下的元素。

HTML：

<div itemprop="reviewBody" class="review-body"><h2 class="h3">Suggestions for improvement</h2><p></p><ul><li>Better managers the ones they have suck</li></ul><h2 class="h3">What I like about the company</h2><p>Great company thanks again for sure</p><h2 class="h3">What I dislike about the company</h2><p>The fact they didn't care about my health</p></div>

Xpath：

response.xpath("(//div[@class='review-body'])/h2[contains(.,'What I like about the company') and not(contains(.,'What I dislike about the company'))]/following-sibling::p/text()").getall

我需要代码来提取 h2 下标题为“我喜欢公司的地方”的“p”，而不是“我不喜欢公司的地方”下的“p”，谢谢

标签： pythonhtmlxpathscrapy

如果我理解正确，您想获取包含特定文本p之后的第一个文本。h2

要实现这一点，只需一步一步地进行：

得到想要的h2：//h2[text()="What I like about the company"]
得到它的第一个兄弟，它是p：/following-sibling::p[1]
得到它的文字：/text()

把它们放在一起，我们得到这个：

>>> sel.xpath('//h2[text()="What I like about the company"]/following-sibling::p[1]/text()').get()
'Great company thanks again for sure'

python - Scrapy Xpath 不包含命令不起作用

问题描述

解决方案

推荐阅读