首页 > 解决方案 > 如何在 Python 中使用 BeautifulSoup 从文本中获取标签

问题描述

我想从文本中获取标签和类名。

示例 HTML:

<a _sp="p2481888.m1379.l3250" href="https://www.ebay.com/b/Electronics/bn_7000259124">Electronics</a>

如何获取标签和类名?ap2481888.m1379.l3250

from bs4 import BeautifulSoup
import requests

Source = input("Enter the source: ")
Request = requests.get(Source, headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36 Edg/91.0.864.59"})
Soup = BeautifulSoup(Request.text, "html.parser")

Target = Soup.find_all(text="Electronics")

print(Target)

这是一张图片:

标签: pythonbeautifulsoup

解决方案


当您这样做时,它会根据您的情况find_all(text="Electronics")返回文本Electronics。要获取a,您可以使用.previous_element,然后,要获取标签的名称,请使用.name。文本p2481888.m1379.l3250是标签的属性,使用以下方式访问它[]

from bs4 import BeautifulSoup


html = """<a _sp="p2481888.m1379.l3250" href="https://www.ebay.com/b/Electronics/bn_7000259124">Electronics</a>"""
soup = BeautifulSoup(html, "html.parser")

target = soup.find_all(text="Electronics")

for tag in target:
    a_tag = tag.previous_element
    print("Tag name:", a_tag.name)
    print(a_tag["_sp"])

输出:

Tag name: a
p2481888.m1379.l3250

或者,您可以将标签的名称直接传递给find_all()

target = soup.find_all("a", text="Electronics")

for tag in target:
    print("Tag name:", tag.name)
    print(tag["_sp"])

推荐阅读