首页 > 解决方案 > Too many requests error while crawling users reputation from Stack Overflow

问题描述

I have a list of user ids and I'm interested in crawling their reputation.

I wrote a script using beautifulsoup that crawls users reputation. But the problem is, I get Too many requests error when my script has run for less than a minute. After that, I am unable to open the Stack Overflow manually on browser too.

My question is, how do I crawl the reputation without getting too many request error?

My code is given below:

for id in df['target']:
    url='https://stackoverflow.com/users/'+str(id)
    print(url)
    response=get(url)
    html_soup=BeautifulSoup(response.text, 'html.parser') 
    site_title = html_soup.find("title").contents[0]
    if "Page Not Found - Stack Overflow" in site_title:
        reputation="NA"
    else:    
        reputation=(html_soup.find(class_='grid--cell fs-title fc-dark')).contents[0].replace(',', "")
        print(reputation)

标签: pythonbeautifulsoupweb-crawler

解决方案


我建议使用 Python模块并在你的 for 循环中time抛出一个。time.sleep(5)该错误来自您在太短的时间内发出太多请求。不过,您可能不得不玩弄实际的睡眠时间才能让它正确。


推荐阅读