scala - 如何使用 net.ruippeixotog.scalascraper 从 html 表中抓取所有链接

问题描述

我正在尝试从 http 表中提取所有链接，然后键入：

 doc >> elementList(".countries")

我已经停在这个 html 上：

<tr class="countries" valign="top"> 
 <td nowrap> </td>
 <td nowrap>
  <a href="https://ar.indeed.com/"><img src="/images/flags/ar.png"></a> 
  <a href="https://ar.indeed.com/">Argentina</a> <br> 
  <a href="https://au.indeed.com/"><img src="/images/flags/au.png"></a> 
  <a href="https://au.indeed.com/">Australia</a> <br> 
  <a href="https://at.indeed.com/"><img src="/images/flags/at.png"></a> 
  <a href="https://at.indeed.com/">Austria</a> <br> 
 </td> 
</tr>

现在我想从中获取所有链接。
当我输入：

 doc >> elementList(".countries") >> attr("href")("a")`

然后我只得到第一个链接： https://ar.indeed.com/

标签： scalaweb-scraping

使用attrs代替attr：

doc >> elementList(".countries") >> attrs("href")("a")`

scala - 如何使用 net.ruippeixotog.scalascraper 从 html 表中抓取所有链接

问题描述

解决方案

推荐阅读