首页 > 技术文章 > 爬虫时遇到的' 编码错误gbk ' 的解决方案

junjun511 2019-03-05 20:05 原文


# 每次请求一次,然后写文件,这样可以规避多次请求触发反爬虫
r = requests.get('https://www.pearvideo.com/video_1522192')
html = r.content.decode("utf-8")
print(html)
with open("./test.html","w") as f:
    f.write(html.encode("gbk","ignore").decode("gbk","ignore"))   #再次编码,解码

#读取文件
with open('test.html', encoding='gbk') as file_obj:
    contents = file_obj.read()
#正则匹配视频地址
regex = re.compile('srcUrl="(.+?)"')
print(regex.findall(contents))

 

推荐阅读