首页 > 解决方案 > 根据 Beautiful Soup/Python 中的条件返回表信息

问题描述

我正在尝试抓取此页面:https ://www.nysenate.gov/legislation/bills/2019/s8450

我只想从表中提取信息(单击“查看操作”时出现的信息),如果它包含以下字符串:"Delivered To Governor".

我可以遍历表格,但是尝试剥离所有额外的标记文本时遇到了麻烦。

url = "https://www.nysenate.gov/legislation/bills/2019/s8450"
raw_html = requests.get(url).content
soup = BeautifulSoup(raw_html, "html.parser")

bill_life_cycle_table = soup.find("tbody")
bill_life_cycle_table

标签: pythonbeautifulsoup

解决方案


您可以提供 if 条件来检查单元格中是否存在字符串并查找先前的单元格值。使用 CSS 选择器select()

from bs4 import BeautifulSoup
import requests

url = "https://www.nysenate.gov/legislation/bills/2019/s8450"
raw_html = requests.get(url).content
soup = BeautifulSoup(raw_html, "html.parser")
tablebody=soup.select_one(".table.c-bill--actions-table > tbody")
for item in tablebody.select("td"):
    if "delivered to governor" in item.text:
        print(item.find_previous("td").text)

控制台输出:

Dec 11, 2020

推荐阅读