首页 > 解决方案 > Xpath scrapy 结果与预期不符

问题描述

我一直在尝试获取前面标签的值。这就是我正在做的事情:

html页面的结构:

...
<tr class="destaque no-hover">
    <td class="periodo" colspan="6">2020.1</td>
</tr>
<tr class="linhaPar">
    <td>Text1</td>
    <td align="center">01</td>
    <td align="right">312h</td>
    <td align="center">3T12</td>
</tr>
<tr class="linhaImpar">
    <td>Text2</td>
    <td align="center">01</td>
    <td align="right">12h</td>
    <td align="center">5M12</td>
</tr>
...
<tr class="destaque no-hover">
    <td class="periodo" colspan="6">2016.1</td>
</tr>
<tr class="linhaPar">
    <td>Text7</td>
    <td align="center">01</td>
    <td align="right">2h</td>
    <td align="center">2N12</td>
</tr>
<tr class="linhaImpar">
    <td>Text8</td>
    <td align="center">01</td>
    <td align="right">32h</td>
    <td align="center">4T12</td>
</tr>
...
<tr class="destaque no-hover">
    <td class="periodo" colspan="6">2014.2</td>
</tr>
<tr class="linhaPar">
    <td>TextN-1</td>
    <td align="center">01</td>
    <td align="right">2h</td>
    <td align="center">2N12</td>
</tr>
<tr class="linhaImpar">
    <td>TextN</td>
    <td align="center">01</td>
    <td align="right">32h</td>
    <td align="center">4T12</td>
</tr>

所以,我正在尝试获取每一个的信息tr classes="linhaPar|linhaImpar"

for i in response.xpath('//tr[@class="linhaPar" or @class="linhaImpar"]')
    _aux = i.xpath('./td[1]')

但是,我也需要那些td[@class="periodo"],所以我被 xpath 困住了

# I've tried this, but return a list of elements that matches, not the close one, as I want
    _p = _aux.xpath('./preceding::tr[td[@class="periodo"]')

# I've also tried this, but won't work
    _p = _aux.xpath('./preceding::tr[td[@class="periodo"] and position()=1]')

解决了

也许当我提出这个问题时,我还不够清楚。将periodo不同数量的变化tr放在一起。我尝试搜索的每一种方式,都向我返回可能的结果列表或 nada。为了解决这个问题,我尝试了periodo“for loop xpath”中考虑的解决方案:

_p = ""
for i in response.xpath('//tr[@class="linhaPar" or @class="linhaImpar" or @class="destaque no-hover"]'):
    # Check if it's a td with period
    if 'destaque no-hover' == i.xpath('./@class').get():
        _p = i.xpath('./td/text()').get()
        continue # Force to go to the next one

标签: pythonpython-3.xxpathscrapylogic

解决方案


这个 XPath:

'//tr[@class="linhaPar" or @class="linhaImpar" or td[@class="periodo"]]' 

推荐阅读