首页 > 解决方案 > 当其兄弟姐妹具有同名的父级时,如何提取字符串的子字符串(BeautifulSoup)

问题描述

例如:

<ul class="key-dates">
            
                <li>
                    Birthday: Monday 26 April 2021
                </li>
                <li>
                    Christmas: Saturday 25 December 2021
                </li>
                <li>
                    New Years: Saturday 1 January 2021
                </li>
            
        </ul>

说如果我只是想取消生日日期,我会怎么做?

import requests
import bs4

info = requests.get('url')

标签: pythonbeautifulsoup

解决方案


您可以使用 CSS 选择器(:contains:-soup-contains):

from bs4 import BeautifulSoup

html_doc = """
<ul class="key-dates">
            
                <li>
                    Birthday: Monday 26 April 2021
                </li>
                <li>
                    Christmas: Saturday 25 December 2021
                </li>
                <li>
                    New Years: Saturday 1 January 2021
                </li>
            
        </ul>
"""

soup = BeautifulSoup(html_doc, "html.parser")

birthday = soup.select_one('.key-dates li:-soup-contains("Birthday")')
print(birthday.text.strip())

印刷:

Birthday: Monday 26 April 2021

或者没有 CSS:

birthday = soup.find("li", text=lambda t: "Birthday" in t)
print(birthday.text.strip())

推荐阅读