python-3.x - 抓取时未从网站获取适当的汤对象

问题描述

我正在尝试使用 BeautifulSoup 和请求来抓取雅虎财经网站，但没有得到正确的汤。它给了我一个 404 page not found html 代码，而不是给我网站的原始 html 代码。这是我的代码。


from bs4 import BeautifulSoup
import requests

soup = BeautifulSoup(requests.get('https://finance.yahoo.com/quote/FBRX/profile?p=FBRX').text, 'lxml')
print(soup)

这是我的输出：

你能帮我抓取这个网站吗？

标签： python-3.xweb-scraping

尝试设置User-AgentHTTP 标头以从服务器获得正确的响应：

import requests
from bs4 import BeautifulSoup

url = "https://finance.yahoo.com/quote/FBRX/profile?p=FBRX"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
print(soup.h1.text)

印刷：

Forte Biosciences, Inc. (FBRX)

python-3.x - 抓取时未从网站获取适当的汤对象

问题描述

解决方案

推荐阅读