python-3.x - 抓取时登录超时
问题描述
当我使用 BeautifulSoup 和 urllib 库抓取 boxrec (www.boxrec.com) 时,我的登录超时并且进程停止。我需要手动注销并登录以恢复该过程。
我正在使用的库:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
#This is the final function that is getting interrupted
def AllDataWriter(rankingURL, nfileNameT):
uClientRanking = uReq(rankingURL)
try:
rankingURL_html = uClientRanking.read()
except (http.client.IncompleteRead) as e:
rankingURL_html = e.partial
rankingURL_soup = soup(rankingURL_html, 'html.parser')
rankingFighters = rankingURL_soup.findAll('a', {'class': 'personLink'})
with open(nfileNameT, mode='w', encoding="utf-8") as csv_file:
f = csv.writer(csv_file)
f.writerow(dataBaseHeader)
i=0
for item in rankingFighters:
thisURL = 'http://boxrec.com' + rankingFighters[i]['href']
fighterArray = getFighterData(thisURL)
for d in range(0,int((len(fighterArray))/38)):
u = d * 38
f.writerow(fighterArray[u:(u+38)])
i = i + 1
解决方案
推荐阅读
- mysql - MySQL 在连接表上应用过滤器
- php - php://output - Only output part of page and not whole page
- reactjs - Mutate multiple states from within useEffect causes react-hooks/exhaustive-deps warning
- java - NetBeans/Java - 断点无法捕获异常
- mysql - 返回混合的非系列项目 + 下一个即将推出的系列
- symfony - Symfony 和 NLTK
- django - AssertionError: You need to pass a valid Django Model in UserProfile.Meta, received "None"
- r - x.ar 对 R 变量 x 意味着什么?
- wizard - 来自向导的 Odoo 12 确认表(确认/丢弃)
- java - 将 Spring Boot Starter Web 依赖项与 Mongo Driver 3.11.0 一起使用时出现 java.lang.NoSuchMethodError