python - Xpath scrapy 结果与预期不符
问题描述
我一直在尝试获取前面标签的值。这就是我正在做的事情:
html页面的结构:
...
<tr class="destaque no-hover">
<td class="periodo" colspan="6">2020.1</td>
</tr>
<tr class="linhaPar">
<td>Text1</td>
<td align="center">01</td>
<td align="right">312h</td>
<td align="center">3T12</td>
</tr>
<tr class="linhaImpar">
<td>Text2</td>
<td align="center">01</td>
<td align="right">12h</td>
<td align="center">5M12</td>
</tr>
...
<tr class="destaque no-hover">
<td class="periodo" colspan="6">2016.1</td>
</tr>
<tr class="linhaPar">
<td>Text7</td>
<td align="center">01</td>
<td align="right">2h</td>
<td align="center">2N12</td>
</tr>
<tr class="linhaImpar">
<td>Text8</td>
<td align="center">01</td>
<td align="right">32h</td>
<td align="center">4T12</td>
</tr>
...
<tr class="destaque no-hover">
<td class="periodo" colspan="6">2014.2</td>
</tr>
<tr class="linhaPar">
<td>TextN-1</td>
<td align="center">01</td>
<td align="right">2h</td>
<td align="center">2N12</td>
</tr>
<tr class="linhaImpar">
<td>TextN</td>
<td align="center">01</td>
<td align="right">32h</td>
<td align="center">4T12</td>
</tr>
所以,我正在尝试获取每一个的信息tr classes="linhaPar|linhaImpar"
for i in response.xpath('//tr[@class="linhaPar" or @class="linhaImpar"]')
_aux = i.xpath('./td[1]')
但是,我也需要那些td[@class="periodo"]
,所以我被 xpath 困住了
# I've tried this, but return a list of elements that matches, not the close one, as I want
_p = _aux.xpath('./preceding::tr[td[@class="periodo"]')
# I've also tried this, but won't work
_p = _aux.xpath('./preceding::tr[td[@class="periodo"] and position()=1]')
解决了
也许当我提出这个问题时,我还不够清楚。将periodo
不同数量的变化tr放在一起。我尝试搜索的每一种方式,都向我返回可能的结果列表或 nada。为了解决这个问题,我尝试了periodo
在“for loop xpath”中考虑的解决方案:
_p = ""
for i in response.xpath('//tr[@class="linhaPar" or @class="linhaImpar" or @class="destaque no-hover"]'):
# Check if it's a td with period
if 'destaque no-hover' == i.xpath('./@class').get():
_p = i.xpath('./td/text()').get()
continue # Force to go to the next one
解决方案
这个 XPath:
'//tr[@class="linhaPar" or @class="linhaImpar" or td[@class="periodo"]]'
推荐阅读
- javascript - 动态到达时 JSON 对象属性为“未定义”
- r - 与这个嵌套的 sapply() 等效的高阶函数是什么?
- angular - 预期 1 个匹配请求,找到 2 个请求。如何测试 2 个请求
- arrays - 使用分组连接和未连接项目的 Ruby 数组计算
- msal - MSAL:是否可以在不弹出弹出窗口的情况下获得登录状态?
- c# - C# - 启动后应用程序在启动时被挂起
- excel - 计算两列之间的小时数
- google-bigquery - 标准 SQL - 对列中的数组求和
- javascript - 条件下拉选项
- php - 如何通过 php 从 jquery datepicker 中获取一年?