首页 > 解决方案 > 美丽的汤嵌套循环

问题描述

我希望创建一个列表,列出此列表中的所有公司。我希望每个获胜者在 HTML 中都有自己的部分,但看起来有多个跨多个 div 组合在一起。你会建议如何解决这个问题?我能够拉出所有的 div,但我不知道如何适当地循环它们。谢谢!

import requests
from bs4 import BeautifulSoup
import csv

request = requests.get("https://growthcapadvisory.com/growthcaps-top-40-under-40-growth-investors-of-2020/")
text = request.text

soup = BeautifulSoup(text, 'html.parser')
element = soup.find()

person = soup.find_all('div', class_="under40")

标签: python-3.xbeautifulsoup

解决方案


此解决方案使用 css 选择器

import requests
from bs4 import BeautifulSoup

request = requests.get("https://growthcapadvisory.com/growthcaps-top-40-under-40-growth-investors-of-2020/")
text = request.text

soup = BeautifulSoup(text, 'html.parser')
# if you have an older version you'll need to use contains instead of -soup-contains
firm_tags = soup.select('h5:-soup-contains("Firm")  strong')
# extract the text from the selected bs4.Tags
firms = [tag.text for tag in firm_tags]
# if there is extra whitespace
clean_firms = [f.strip() for f in firms]

它通过选择其父 h5 标签包含单词“Firm”的所有强标签来工作

有关bs4 的 CSS 选择器的更多信息,请参阅SoupSieve 文档


推荐阅读