首页 > 解决方案 > 用 Python 抓取 Edmunds.com 网站时如何处理读取超时错误?

问题描述

我正在尝试网络抓取的初学者,试图从https://www.edmunds.com/抓取客户评论以进行研究。

然而,即使是基本代码也只给出了读取超时错误。

import requests 
from bs4 import BeautifulSoup 
result = requests.get("https://www.edmunds.com/")
print(result.status_code)

你能帮忙吗?

标签: pythonerror-handlingbeautifulsoupnlptimeout

解决方案


使用requests_html或者添加User-Agent到 headers

from requests_html import HTMLSession
session = HTMLSession()
url = session.get('https://www.edmunds.com/')

try:
    status = url.status_code
    print(status)
except Exception as e:
    print(e)

import requests

headers = {
    "User-Agent": "Mozilla/5.0"
}
result = requests.get(url="https://www.edmunds.com", headers=headers)

try:
    status = result.status_code
    print(status)
except Exception as e:
    print(e)

在开始抓取之前,请参阅https://www.edmunds.com/robots.txt


推荐阅读