首页 > 技术文章 > 爬虫基础

ikong 2019-03-10 10:04 原文

 

urllibopen

 

基本库区别

 

直接使用urllibopen无法构建复杂的header信息,需要借助Request

from urllib import request,parse
#
# url = 'http://httpbin.org/post'
# headers = {
# "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36",
# "Host":"httpbin.org"
# }
# dict = {
# 'name':'Germey'
# }
# data = bytes(parse.urlencode(dict), encoding='utf-8')
# req = request.Request(url=url, data=data, headers=headers, method='POST')
# response = request.urlopen(req)
# print(response.read().decode('utf-8'))

# 还可以add_headers方法


url = 'http://httpbin.org/post'

dict = {
'name':'Germey'
}
data = bytes(parse.urlencode(dict), encoding='utf-8')
req = request.Request(url=url, data=data, method='POST')
req.add_header("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36")
response = request.urlopen(req)
print(response.read().decode('utf-8'))

基本库使用起来比较麻烦,添加请求头,请求数据,设置代理设置cookie等等都比较麻烦,因此使用Request库比较好

安装 pip3 install request

 




 

推荐阅读