首页 > 技术文章 > python爬runoob目录链接栏

zhuyu139 2019-12-19 12:14 原文

import re
import requests
url='https://www.runoob.com/python3/python3.html'
response=requests.get(url)
html=response.text
response.encoding='utf-8'
dl=re.findall(r'<div class="design" id="leftcolumn">.*?</div>',html,re.S)[0]
tree=re.findall(r'title="(.*?)".*?href="(.*?)"',dl)
lst=[]
def get_data(link):
    lst.append(link)
    ht=requests.get(link)
    print('已下载',len(lst),'')
for tree_info in tree:
    url='https://www.runoob.com/python3{}\n'.format(tree_info[1])
    with open('D:\Desktop\测试\html.txt','a') as f:
        f.write(url)
    get_data(url)

推荐阅读