首页 > 解决方案 > Xpath,Scrapy,使用 id 从 div 类打印内容?

问题描述

下面的代码示例:

<div class="accordion-content" data-tab-content="" role="tabpanel" aria-labelledby="fmh1ij-accordion-label" aria-hidden="true" id="fmh1ij-accordion">

Number of Seats:    Seventeen (17) certified seats for take-off &amp; landing - including jump seat
<br>

Forward Cabin:  Four (4) place executive club seats with pull-out tables
<br>
Mid Cabin:  Four (4) place conference group opposite three (3) place 16G divan
<br>
Aft Cabin:  Two (2) place executive club seats opposite three (3) place 16G divan
<br>
Lavatory Location(s):   Forward crew lavatory and fully enclosed aft lavatory
<br>

我需要提取 'div class=accordion-content' 下面的内容。有没有办法使用 id 来做到这一点?'id="fmh1ij-手风琴"?

我需要提取的内容:

“座位数:十七 (17) 等。前舱:四 (4) 等……”

我尝试了下面的代码,但没有成功。

response.xpath("//div[contains(@id,'fmh1ij-accordion')]//text()").extract()

标签: htmlpython-3.xxpathweb-scrapingscrapy

解决方案


由于我们正在处理 id 标签,因此无需使用contains,您应该在 XPath 中搜索与您要查找的 id 标签匹配的元素:

response.xpath("//div[@id='fmh1ij-accordion']//text()").extract()

试试上面的片段,如果它有效,请告诉我。

编辑

检查源 URL 后,似乎 id 标记是动态生成的。在这种情况下,我建议按类选择元素或使用不同的 XPath。给你两个建议:

response.xpath('//a[contains(text(), "Interior")]/following-sibling::div//text()').extract()


response.xpath('//li[contains(@class,"accordion-item") and contains(a/text(), "Interior")]/div//text()').extract()

推荐阅读