首页 > 解决方案 > 抓取 wsj.com 时页面上的元素不存在

问题描述

我正在使用 Python 来抓取网页。这是我的代码:

import requests
from bs4 import BeautifulSoup

# Set local variables 
URL = 'https://www.wsj.com/market-data/bonds'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

# Get Master data table and Last update from URL
table = soup.find("table", attrs={"class": "WSJTables--table--1QzSOCfq "})

print(table)

该代码的结果是什么——我找不到表,也不知道为什么。

有什么建议么?

标签: pythonpython-3.xweb-scrapingbeautifulsoup

解决方案


你需要加上user-agentheader,否则页面会认为你是bot,会屏蔽你。另请注意,您的班级名称中有一个额外的空格

import requests
from bs4 import BeautifulSoup

URL = 'https://www.wsj.com/market-data/bonds'


HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36"
}

page = requests.get(URL, headers=HEADERS)
soup = BeautifulSoup(page.content, 'html.parser')

table = soup.find("table", attrs={"class": "WSJTables--table--1QzSOCfq"})
print(table)

推荐阅读