首页 > 解决方案 > Beautifull Soup 刮网页所有赔率表

问题描述

我想使用以下代码从 URL 网页获取所有表格。

import csv
import requests
from bs4 import BeautifulSoup


urls = [
'https://g10oal.com/match/c81e21f3-7804-4961-ac74-4e2804a19784/odds'
]


all_data = []
for url in urls:
    page = requests.get(url)

    soup = BeautifulSoup(page.content, "html.parser")
    table = soup.findAll("class", {"class":"table table-sm  odds-compare-table"})[0]

    # here I store all rows to list `all_data`
    for row in table.findAll('tr'):
        tds = [cell.get_text(strip=True, separator=' ') for cell in row.findAll(["td",      "th"])]
        all_data.append(tds)
        print(*tds)

# write list `all_data` to CSV
with open("c:/logs/test.csv", "wt+", newline="") as f:
    writer = csv.writer(f)
    for row in all_data:
        writer.writerow(row)

运行代码后,显示“IndexError: list index out of range”

标签: python-3.xbeautifulsoup

解决方案


第一个是argument标签/元素的,而不是.findAllnameattribute

你应该做

# make sure to use a single space only between table-sm and odds-compare-table
tables = soup.findAll("table", {"class": "table table-sm odds-compare-table"})

# or, pass the classes as list
tables = soup.findAll("table", {"class": ["table", "table-sm", "odds-compare-table"]})

然后你循环tables

for table in tables:
  # here I store all rows to list `all_data`
  for row in table.findAll('tr'):
    tds = [cell.get_text(strip=True, separator=' ') for cell in row.findAll(["td", "th"])]
    all_data.append(tds)
    print(*tds)

如果你只想要第一个table你可以使用find而不是findAll

table = soup.find("table", {"class": ["table", "table-sm", "odds-compare-table"]})

推荐阅读