python - 使用 urlopen 时出现 HTTP 406 Not Acceptable client 错误
问题描述
我正在使用urllib.request.urlopen来查询 URL http://dblp.org/db/conf/lak/index。由于某种原因,我无法使用 Python 模块urllib 访问该站点,因为我收到以下 HTTP 状态代码错误:
HTTPError:HTTP 错误 406:不可接受
这是我用来发出此请求的代码:
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = 'http://dblp.org/db'
html = urlopen(url).read()
soup = BeautifulSoup(html)
print(soup.prettify())
我不确定是什么导致了这个错误,我需要帮助来解决这个错误。
以下是与此错误相关的堆栈跟踪:
HTTPError Traceback (most recent call last)
<ipython-input-5-b158a1e893a0> in <module>
----> 1 html = urlopen("https://dblp.org/db").read()
2 #print(html)
3 soup = BeautifulSoup(html)
4 soup.prettify()
~\Anaconda3\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
220 else:
221 opener = _opener
--> 222 return opener.open(url, data, timeout)
223
224 def install_opener(opener):
~\Anaconda3\lib\urllib\request.py in open(self, fullurl, data, timeout)
529 for processor in self.process_response.get(protocol, []):
530 meth = getattr(processor, meth_name)
--> 531 response = meth(req, response)
532
533 return response
~\Anaconda3\lib\urllib\request.py in http_response(self, request, response)
639 if not (200 <= code < 300):
640 response = self.parent.error(
--> 641 'http', request, response, code, msg, hdrs)
642
643 return response
~\Anaconda3\lib\urllib\request.py in error(self, proto, *args)
567 if http_err:
568 args = (dict, 'default', 'http_error_default') + orig_args
--> 569 return self._call_chain(*args)
570
571 # XXX probably also want an abstract factory that knows when it makes
~\Anaconda3\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
501 for handler in handlers:
502 func = getattr(handler, meth_name)
--> 503 result = func(*args)
504 if result is not None:
505 return result
~\Anaconda3\lib\urllib\request.py in http_error_default(self, req, fp, code, msg, hdrs)
647 class HTTPDefaultErrorHandler(BaseHandler):
648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp)
650
651 class HTTPRedirectHandler(BaseHandler):
HTTPError: HTTP Error 406: Not Acceptable
解决方案
我正在调查406 错误代码,当服务器无法使用请求中指定的接受标头进行响应时会发生这种情况。如果我能让urlopen正常工作,我也会发布该答案。
使用Python 请求时我没有收到此错误
import requests
from bs4 import BeautifulSoup
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
raw_html = requests.get('http://dblp.org/db/conf/lak/index')
soup = BeautifulSoup(raw_html.content, 'html.parser')
print(soup.prettify())
下面的答案使用urlopen,它不会产生 406 错误。
from urllib.request import Request
from urllib.request import urlopen
from bs4 import BeautifulSoup
raw_request = Request('https://dblp.org/db/conf/lak/index')
raw_request.add_header('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0')
raw_request.add_header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8')
resp = urlopen(raw_request)
raw_html = resp.read()
soup = BeautifulSoup(raw_html, 'html.parser')
print(soup.prettify())
推荐阅读
- javascript - 如何从输入字段解析 JSON
- awk - 查找与子字符串匹配的行,然后打印该行和以下内容
- node.js - 代理错误:无法代理请求...从 localhost:3000 到 http://localhost:5000/
- docker - 从容器挂载 CIFS/SMB 共享的安全方式,无需特权标志或 SYS_ADMIN 功能
- r - R中ggplot图形的图例问题
- python - 让用户再次尝试输入正确的输入而不退出
- html - 如何使用 jQuery 选择所有表单字段元素?
- python-3.x - 如何使用 selenium python 提取多个文本
- php - PHP 函数输出中的一些错误我无法理解
- jenkins - 如果并行阶段标记为已跳过,则在 Jenkins Blue Ocean 中显示文本