首页 > 解决方案 > 我将如何使用美丽的汤从这个网站上抓取数据?

问题描述

[![问题][1]][1]

的HTML

上面是 HTML、网站的样子和我的代码。我正在尝试将此信息提取到字典中。例如 {"Official Symbol: ELF4"} 等等。我已经看过一些教程,但我仍然感到困惑。谁能帮我吗?

import requests
from bs4 import BeautifulSoup
url = "https://www.ncbi.nlm.nih.gov/gene/2000"
r  = requests.get(url)
data = r.content
soup = BeautifulSoup(data, 'html.parser')
#text_found = soup.find("dd",attrs={"class":"noline"}).text

dd_data = soup.find_all("dd")
for dditem in dd_data:
    if dditem != "None":
        print(dditem.string)

da_data = soup.find_all("dt")
for daitem in da_data:
    if daitem != "None":
        print(daitem.string)

标签: pythonbeautifulsoup

解决方案


要抓取数据,dict请参阅以下示例:

import requests
from bs4 import BeautifulSoup


URL = "https://www.ncbi.nlm.nih.gov/gene/2000"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")

result = {
    k.text.replace(" ", "").replace("\n", " "): v.find_next(text=True)
    for k in soup.select("dt.noline")
    for v in soup.select("dd.noline")
}


print(result)

输出:

{'Official Symbol': 'ELF4'}

推荐阅读