web-scraping - Google 表格中的 IMPORTXML 函数

您想要创建用于检索给定 Wikipedia 页面的 Industry 值的 xpath。

如果我的理解是正确的，与其他模式一样，这个 xpath 的公式怎么样？请认为这只是几个答案之一。

示例公式：

=IMPORTXML(A1,"//th[text()='Industry']/following-sibling::td")

xpath 是//th[text()='Industry']/following-sibling::td.
https://en.wikipedia.org/wiki/Target_Corporation在这种情况下，或的 URLhttps://en.wikipedia.org/wiki/Boohoo.com被放在单元格“A1”中。

结果：

参考：

XPath 轴

添加：

从您的回复中，我知道您想再添加 2 个 URL。所以所有的URL如下。

https://en.wikipedia.org/wiki/Target_Corporation
` https://en.wikipedia.org/wiki/Boohoo.com
` https://en.wikipedia.org/wiki/Woot
` https://en.wikipedia.org/wiki/TripAdvisor

问题和解决方法：

对于上述 URL，当使用公式时=IMPORTXML(A1,"//th[text()='Industry']/following-sibling::td")，返回Retail, Fashion,Retail和Travel, services。

当 xpath 修改为//th[text()='Industry']/following-sibling::td/a, Retail, #N/A,#N/A并Travel返回时。

其原因是由于以下差异。

<tr>
  <th scope="row">Industry</th>
  <td class="category"><a href="/wiki/Travel" title="Travel">Travel</a> services</td>
</tr>

和

<tr>
  <th scope="row" style="padding-right:0.5em;">Industry</th>
  <td class="category" style="line-height:1.35em;"><a href="/wiki/Retail" title="Retail">Retail</a></td>
</tr>

和

<tr>
  <th scope="row" style="padding-right:0.5em;">Industry</th>
  <td class="category" style="line-height:1.35em;">Fashion</td>
</tr>

通过这一点，我认为不幸的是，为了检索Travel，Retail并且Fashion从上面，那些不能只用一个 xpath 直接检索。所以我为这种情况使用了一个内置函数。

解决方法：

在这个解决方法中，我使用了INDEX. 请认为这只是几个答案之一。

=INDEX(IMPORTXML(A1,"//th[text()='Industry']/following-sibling::td"),1,1)

xpath 是//th[text()='Industry']/following-sibling::td. 这没有被修改。
在这种情况下，URL 放在单元格“A1”中。
当检索到 2 个值时，将检索第一个值。通过这个，我使用了INDEX.

结果：

web-scraping - Google 表格中的 IMPORTXML 函数

问题描述

解决方案

示例公式：

结果：

参考：

添加：

问题和解决方法：

解决方法：

推荐阅读