首页 > 解决方案 > 用 BeautifulSoup 刮 Json 表太熊猫

问题描述

如何从该网站获取数据?它似乎是一个json结构。可以用 BeautifulSoup 得到它吗?

网址:https ://www.ultimatetennisstatistics.com/statsLeadersTable?current=1&rowCount=-1&sort%5Bvalue%5D=desc&searchPhrase=&category=aces&season=&fromDate=&toDate=&level=&bestOf=&surface=&indoor=&speed=&round=&result=&tournamentId =&对手=&countryId=&minEntries=&active=true&_=1622884929848

import requests
from bs4 import BeautifulSoup
import pandas as pd

r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')

标签: pythonjsonweb-scrapingbeautifulsoup

解决方案


您可以直接从数据构造熊猫数据框。例如:

import requests
import pandas as pd

url = "https://www.ultimatetennisstatistics.com/statsLeadersTable?current=1&rowCount=-1&sort%5Bvalue%5D=desc&searchPhrase=&category=aces&season=&fromDate=&toDate=&level=&bestOf=&surface=&indoor=&speed=&round=&result=&tournamentId=&opponent=&countryId=&minEntries=&active=true&_=1622884929848"
data = requests.get(url).json()

df = pd.json_normalize(data["rows"])
print(df)
df.to_csv("data.csv", index=False)

印刷:

     rank  playerId                          name  value               country.name country.id country.code
0       1      3333                  Ivo Karlovic  13687                    Croatia        CRO           hr
1       2      4544                    John Isner  12806              United States        USA           us
2       3      3819                 Roger Federer  11371                Switzerland        SUI           ch
3       5      3852               Feliciano Lopez   9920                      Spain        ESP           es
4       8      5016                   Sam Querrey   8466              United States        USA           us
5       9      5670                  Milos Raonic   8130                     Canada        CAN           ca
6      13      4728                Kevin Anderson   7262               South Africa        RSA           za
7      14      5220                   Marin Cilic   7246                    Croatia        CRO           hr
8      17      4541            Jo Wilfried Tsonga   6634                     France        FRA           fr
9      18      4789                  Gael Monfils   6245                     France        FRA           fr
10     20      4920                Novak Djokovic   6069                     Serbia        SRB           rs
11     21      4526                 Stan Wawrinka   5900                Switzerland        SUI           ch

...

并保存data.csv(来自 LibreOffice 的屏幕截图):

在此处输入图像描述


推荐阅读