首页 > 解决方案 > 漂亮的汤刮取一个不是首先出现的变量

问题描述

根据Python - ETF 每日数据网络抓取,我尝试抓取费用比率。

import requests
from bs4 import BeautifulSoup

html = requests.get("https://www.marketwatch.com/investing/fund/ivv").text

soup = BeautifulSoup(html, "html.parser")

if soup.h1.string == "Pardon Our Interruption...":
    print("They detected we are a bot. We hit a captcha.")
else:
    price = soup.find("li", class_="kv__item").find("span").string

    print(price)

但是,这会返回open. 我怎么能告诉它在返回时取第 10 个li而不是第一个?

标签: pythonweb-scrapingbeautifulsoup

解决方案


使用最新的 bs4,您绝对可以使用 css nth-of-type选择器

import requests
from bs4 import BeautifulSoup

html = requests.get("https://www.marketwatch.com/investing/fund/ivv").text

soup = BeautifulSoup(html, "lxml")

if soup.h1.string == "Pardon Our Interruption...":
    print("They detected we are a bot. We hit a captcha.")
else:
    price = soup.select_one('.list--kv li:nth-of-type(10) span').string
    print(price)

您甚至可以将选择器缩短为:

li:nth-of-type(10) span

对于 li 内的 span 列表:

.list--kv li span

推荐阅读