首页 > 解决方案 > 如何使用python从sofascore中抓取足球结果

问题描述

我正在 Python 3.8 上开发这个项目。我必须将数据下载到 Pandas Dataframe 中,并最终写入 2018 年和 2019 年所有英超球队的数据库(SQL 或 Access)。我正在尝试使用 beautifulsoup。我有一个适用于soccerbase.com 的代码,但它不适用于sofascore.com @oppressionslayer 到目前为止对代码有所帮助。有人可以帮我吗?

import json

import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

url = "https://www.sofascore.com/football///json"
r = requests.get(url)
soup = bs(r.content, 'lxml')
json_object = json.loads(r.content)

json_object['sportItem']['tournaments'][0]['events'][0]['homeTeam']['name']
# 'Sheffield United'

json_object['sportItem']['tournaments'][0]['events'][0]['awayTeam']['name']  # 'Manchester United'

json_object['sportItem']['tournaments'][0]['events'][0]['homeScore']['current']
# 3

json_object['sportItem']['tournaments'][0]['events'][0]['awayScore']['current']

print(json_object)

如何循环此代码以获取整个团队?我的目标是获取每个团队的数据,其中的行为 ["Event date", "Competition", "Home Team", "Home Score", "Away Team", "Away Score", "Score"] 例如 31/10/ 2019英超 切尔西 1 曼联 2 1-2

我是一个 sarter,我怎样才能得到它?

标签: pythonweb-scrapingbeautifulsoup

解决方案


这段代码可以正常工作。虽然它没有捕获网站的所有数据库,但它是一个强大的爬虫

import simplejson as json
import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

url = "https://www.sofascore.com/football///json"
r = requests.get(url)
soup = bs(r.content, 'lxml')
json_object = json.loads(r.content)

headers = ['Tournament', 'Home Team', 'Home Score', 'Away Team', 'Away Score', 'Status', 'Start Date']
consolidated = []
for tournament in json_object['sportItem']['tournaments']:
    rows = []
    for event in tournament["events"]:
        row = []
        row.append(tournament["tournament"]["name"])
        row.append(event["homeTeam"]["name"])
        if "current" in event["homeScore"].keys():
            row.append(event["homeScore"]["current"])
        else:
            row.append(-1)
        row.append(event["awayTeam"]["name"])
        if "current" in event["awayScore"].keys():
            row.append(event["awayScore"]["current"])
        else:
            row.append(-1)
        row.append(event["status"]["type"])
        row.append(event["formatedStartDate"])
        rows.append(row)
    df = pd.DataFrame(rows, columns=headers)
    consolidated.append(df)

pd.concat(consolidated).to_csv(r'Path.csv', sep=',', encoding='utf-8-sig',
                               index=False)

礼貌 Praful Surve @praful-surve


推荐阅读