首页 > 解决方案 > 页面循环不适用于 python webscrape

问题描述

我对 python 很陌生,并且使用 beautifulsoup 编写了一个脚本来解析网站表。我已经尝试了一切,但无法让循环循环浏览页面。它目前只是将第一页上的数据重复 8 次(页数)。

有人可以帮忙吗?

代码:

import requests
from bs4 import BeautifulSoup

first_year = 2020
last_year = 2020
for i in range(last_year-first_year+1):
    year = str(first_year + i)
    print("Running for year:", year)
    text = requests.get("https://finalsiren.com/AFLPlayerStats.asp?SeasonID="+year).text
    soup = BeautifulSoup(text, "html.parser")
    options = soup.findAll("option")
    opts = []
    for option in options:
        if not option['value'].startswith("20") and not option['value'].startswith("19") and option["value"]:
            opts.append({option["value"]: option.contents[0]})
    for opt in opts:
        for key, value in opt.items():
            print("Doing option:", value)
            text = requests.get("https://finalsiren.com/AFLPlayerStats.asp?SeasonID=" + year + "&Round=" + key).text
            pages_soup = BeautifulSoup(text, "html.parser")
            p = pages_soup.findAll("a")
            pages = 8
            if "&Page=" in str(p[-2]):
                pages = int(p[-2].contents[0])
            for j in range(pages):
                print("Page {}/{}".format(str(j+1), str(pages)))
                parse = requests.get("https://finalsiren.com/AFLPlayerStats.asp?SeasonID={}&Round={}&Page={}".format(year, key, j+1)).text
                p_soup = BeautifulSoup(text, "html.parser")
                tbody = pages_soup.findAll("tbody")
                tbody_soup = BeautifulSoup(str(tbody), "html.parser")
                tr = tbody_soup.findAll("tr")
                for t in tr:
                    t = str(t).replace("</tr>", "").replace("<tr>", "").replace("amp;", "")
                    t = t[4:len(t)-5].split('</td><td>')
                    t.append(str(j+1))
                    t.append(str(value))
                    t.append(str(year))
                    open("output.csv", "a").write("\n" + ";".join(t))

谢谢你。

标签: pythonbeautifulsoup

解决方案


试试这个..

for j in range(pages):
    print("Page {}/{}".format(str(j+1), str(pages)))
    parse = requests.get("https://finalsiren.com/AFLPlayerStats.asp?SeasonID={}&Round={}&Page={}".format(year, key, j+1)).text
    p_soup = BeautifulSoup(parse, "html.parser")
    tbody = p_soup.findAll("tbody")
    tbody_soup = BeautifulSoup(str(tbody), "html.parser")
    tr = tbody_soup.findAll("tr")
    for t in tr:
        t = str(t).replace("</tr>", "").replace("<tr>", "").replace("amp;", "")
        t = t[4:len(t)-5].split('</td><td>')
        t.append(str(j+1))
        t.append(str(value))
        t.append(str(year))
        open("output.csv", "a").write("\n" + ";".join(t))

推荐阅读