首页 > 解决方案 > 使用布尔值通过 Python lxml 执行不同的 XPath 表达式

问题描述

我正在尝试使用 python 脚本和 lxml 从网站上抓取天气数据。风速数据将被提取并附加到列表中以供以后操作。当它被格式化时,我能够得到我需要的信息:

<div class = "day-fcst">
  <div class = "wind">
    <div class = "gust">
      "Gusts to 20-30mph"
    </div>
  </div>
</div>

但是,当出现低风时,网站会在“gust” div 下添加一个子 span 类,如下所示:

<div class = "gust">
  <span class = "nowind">
    "Gusts less than 20mph"
  </span
</div>

我的想法是检查 span 是否存在,如果为 true,则执行 XPath 表达式以将文本拉到 span 下,否则执行 XPath 表达式以将文本拉到“gust” div 下。我尝试搜索使用 XPath 布尔函数的示例,但无法使任何工作(无论是在 Safari 的 Web Inspector 中还是在我的脚本中)。

我当前的代码使用 Python 来检查 span 类是否等同于“nowind”,然后执行 if 和 else 语句,但只执行 else 语句。我当前的代码如下所示:

from lxml import html
import requests

wind = []

source=requests.get('website')
tree = html.fromstring(source.content)

if tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/span/@class') == 'nowind':
  wind.append(tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/span/text()'))
else:
  wind.append(tree.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]/text()'))

print wind

我想用一个产生布尔值的 XPath 表达式来解决这个问题,而不是我当前的解决方法。任何帮助,将不胜感激。我对使用 XPath 还是很陌生,所以我不熟悉使用它的任何功能。

标签: pythonxpathweb-scrapinglxmlboolean-operations

解决方案


对于这两种情况,它们可能具有相同的 xpath 表达式。只需使用//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()

或者,您可以获取<div class = "wind">元素,然后使用text_content()方法来获取文本内容。

In [1]: from lxml import html

In [2]: first_html = '<div class = "day-fcst"><div class = "wind"><div class = "gust">"Gusts to 20-30mph"</div></div></div>'

In [3]: second_html = '<div class = "day-fcst"><div class = "wind"><div class = "gust"><span class = "nowind">"Gusts to 20-30mph"</span></div></div></div>'

In [4]: f = html.fromstring(first_html)

In [5]: s = html.fromstring(second_html)

In [6]: f.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]')[0].text_content()
Out[6]: '"Gusts to 20-30mph"'

In [7]: s.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]')[0].text_content()
Out[7]: '"Gusts to 20-30mph"'

In [8]: print(f.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()'))
['"Gusts to 20-30mph"']

In [9]: print(s.xpath('//div[@class = "day-fcst"]/div[@class = "wind"]/div[@class = "gust"]//text()'))
['"Gusts to 20-30mph"']

推荐阅读