xpath - Unable to understand XPath siblings behaviour
问题描述
I am trying to scrape a HTML page in an scenario where I only have consecutive tags with information.
From the following code I would like to get the text for the tags (e.g. Name1, Name2, ...), taking into consideration:
"a" followed by "span" gives information about that ID being a Customer or not.
"a" followed by "a" means that ID is anonymous.
<span class="list">
<em>List 1:</em>
</span>
<a href="/ID/423006">Name1</a>,
<a href="/ID/115325">Name2</a>
<span class="small">(Customer)</span>,
<a href="/ID/248819">Name3</a>
<span class="small">(Non Customer)</span>,
<a href="/ID/658259">Name4</a>
<span class="small">(Customer)</span>,
<a href="/ID/294083">Name5</a>
<a href="/ID/218292">Name6</a>
<span class="small">(Non Customer)</span>
I'm using the following XPATH to try to match "a" followed by "span"
//a[contains(@href,'ID/') and ./following-sibling::span[1][text() = '(Customer)']]/text()
This will return Name1, Name2 and Name4, even if Name1 is not a Customer. What am I doing wrong?
解决方案
It's because the first following-sibling span of that Name1 does indeed equal "(Customer)".
相反,您应该做的是找到第一个以下同级 ( *[1]
) 并检查该同级是否为span
( [self::span]
),如果是,则检查它是否等于“(客户)”...
//a[contains(@href,'ID/') and ./following-sibling::*[1][self::span][text() = '(Customer)']]/text()
推荐阅读
- c - c - 将 csv 文件中的浮点值写入新文件并遍历字段
- sparql - SPARQL 获取名称/文本而不是标识符作为结果
- sql - 如何查找事务表中仍处于活动状态的记录
- excel - Excel 具有不同结果的多个 IF AND 范围语句
- r - 如何创建有条件的 flexdashboard 布局
- ffmpeg - ffmpeg:无法识别音频输入设备
- excel - 我可以编写一个 Excel 宏来将我的数据放入表格中吗?
- python - 无法理解局部变量和全局变量是如何工作的
- javascript - 如何使用一个函数将数组中的多个特定字符串替换为对象?
- browser - 对于服务器和 Web 客户端之间的高速数据传输,是否有更好的 Web 套接字替代方案?