首页 > 解决方案 > 我如何爬取准备好的 url 列表

问题描述

我要使用 python 爬网 将 url 信息保存为 csv 或 txt 我想在代码中加载时逐页爬网 我该怎么做

import urllib.request
from bs4 import BeautifulSoup
import pandas as pd

with open('crawlingweb.csv') as f:
    content=f.readlines()
    content=[x.strip() for x in content]

url='#I want to bring url from csv or txt file'
html=urllib.request.urlopen(url).read()
soup=BeautifulSoup(html,'lxml')
text=soup.get_text()
print(text)

标签: pythonweb-crawler

解决方案


import urllib.request
from bs4 import BeautifulSoup
import pandas as pd

with open('crawlingweb.csv') as f:
    content=f.readlines()
    content=[x.strip() for x in content]

for i in range(10):
    url=content[i]
    html=urllib.request.urlopen(url).read()
    soup=BeautifulSoup(html,'lxml')
    text=soup.get_text()
    print(text)

推荐阅读