首页 > 解决方案 > 如何使用 Scrapy XPATH 选择这个元素?

问题描述

唯一要求:它需要引用thread-navigation该类,因为该页面还有许多其他分页元素

<section id="thread-navigation" class="group">
<div class="float-left">
<div class="pagination talign-mleft">
<span class="pages">Pages (6):</span>
<span class="pagination_current">1</span>
<a href="I want this text?page=2" class="pagination_page">2</a>

<a href=""I want this text?page=3" class="pagination_page">3</a>
<a href=""I want this text?page=4" class="pagination_page">4</a>
<a href=""I want this text?page=5" class="pagination_page">5</a>
<a href=""I want this text?page=6" class="pagination_last">6</a>
<a href=""I want this text?page=2" class="pagination_next">Next &raquo;</a> //<--- this one
</div>
</div>
</section>

我正在尝试这样的事情: r.xpath('//*[@class="thread-navigation" and contains (., "Next")]').get() 但它总是返回None

谢谢

标签: xpathscrapyweb-crawler

解决方案


您指的不是@class属性,而是@id具有 value的属性thread-navigation。所以试试这个 XPath-1.0 表达式:

r.xpath('//a[ancestor::*/@id="thread-navigation" and contains (text(), "Next")]/@href').get()

它的结果是

我想要这个文本?page=2


推荐阅读