首页 > 解决方案 > 抓取时登录超时

问题描述

当我使用 BeautifulSoup 和 urllib 库抓取 boxrec (www.boxrec.com) 时,我的登录超时并且进程停止。我需要手动注销并登录以恢复该过程。

我正在使用的库:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

#This is the final function that is getting interrupted
def AllDataWriter(rankingURL, nfileNameT):
    uClientRanking = uReq(rankingURL)

    try:
        rankingURL_html = uClientRanking.read()
    except (http.client.IncompleteRead) as e:
        rankingURL_html = e.partial

    rankingURL_soup = soup(rankingURL_html, 'html.parser')
    rankingFighters = rankingURL_soup.findAll('a', {'class': 'personLink'})

    with open(nfileNameT, mode='w', encoding="utf-8") as csv_file:
        f = csv.writer(csv_file)
        f.writerow(dataBaseHeader)

        i=0

        for item in rankingFighters:
            thisURL = 'http://boxrec.com' + rankingFighters[i]['href']
            fighterArray = getFighterData(thisURL)

            for d in range(0,int((len(fighterArray))/38)):    
                u = d * 38
                f.writerow(fighterArray[u:(u+38)])
            i = i + 1

标签: python-3.x

解决方案


推荐阅读