python - BeautifulSoup - 在<em>标签后获取文本</em>

我想这就是你要找的。它找到父 p 元素，将汤对象转换为字符串，删除强元素，然后将字符串转换回汤对象。

from bs4 import BeautifulSoup

soup = BeautifulSoup("<p><strong>High School Honors: </strong><em>Parade </em>All-American; <em>Chicago Sun-Times </em>Illinois Player of the Year honors; rushed for 2,100 yards and 31 TDs as a senior; led team to 14-0 record and Class 4A State Championship as a junior with 1,820 yards and 26 TDs; also lettered in baseball.</p>", 'html.parser')
headerList = []
infoList = []

for strong_tag in soup.findAll('strong'):
    parent = strong_tag.find_parent('p')
    content = str(parent).replace(f'{strong_tag}', '')
    souped_content = BeautifulSoup(content, 'html.parser')
    infoList.append(souped_content)
    headerList.append(strong_tag)

print(headerList)
print(infoList)

这将输出以下内容：

[<strong>High School Honors: </strong>]
[<p><em>Parade </em>All-American; <em>Chicago Sun-Times </em>Illinois Player of the Year honors; rushed for 2,100 yards and 31 TDs as a senior; led team to 14-0 record and Class 4A State Championship as a junior with 1,820 yards and 26 TDs; also lettered in baseball.</p>]

python - BeautifulSoup - 在标签后获取文本

问题描述

解决方案

推荐阅读