首页 > 解决方案 > 无法使用 Selenium 或 BeautifulSoup 抓取动态内容

问题描述

我正在尝试从 URL 中抓取动态内容:https://www.prokabaddi.com/stats/0-102-total-points-statistics。尝试过使用 selenium,BeautifulSoup 但都给我一个空列表。我的代码是:

url = "https://www.prokabaddi.com/stats/0-102-total-points-statistics"

# create a new Chrome session
driver = webdriver.Chrome()
driver.get(url) 
soup.find_all("div", class_="sipk-lb-playerName")

这将返回一个空列表。当我在控制台中检查数据时数据存在,但在页面源中数据和div标签不存在。我相信这与 js 渲染的内容有关。

如何从此 URL 中提取玩家名称和积分。

标签: pythonpython-3.xseleniumweb-scrapingbeautifulsoup

解决方案


进入开发工具并查看 XHR。您将看到直接提取数据的 url。它以 json 格式返回,但可以将其转换为表格:

代码:

import requests
from pandas.io.json import json_normalize

url = 'https://www.prokabaddi.com/sifeeds/kabaddi/static/json/1_0_102_stats.json'
jsonData = requests.get(url).json()

table = json_normalize(jsonData['data'])

输出:

print (table.head(5).to_string())
   match_played  player_id         player_name  position_id position_name rank team        team_full_name  team_id team_name  value
0           101        197      Pardeep Narwal          8.0        Raider    1  PAT         Patna Pirates        6       PAT   1055
1           116         81     Rahul Chaudhari          8.0        Raider    2   TT       Tamil Thalaivas       29        TT    987
2           118         41  Deepak Niwas Hooda          1.0   All Rounder    3  JAI  Jaipur Pink Panthers        3       JAI    892
3           115         26         Ajay Thakur          8.0        Raider    4   TT       Tamil Thalaivas       29        TT    811
4            88        326         Rohit Kumar          8.0        Raider    5  BEN       Bengaluru Bulls        1       BEN    689

并过滤以仅获取名称和积分:

print (table[['player_name','value']])
                     player_name  value
0                 Pardeep Narwal   1055
1                Rahul Chaudhari    987
2             Deepak Niwas Hooda    892
3                    Ajay Thakur    811
4                    Rohit Kumar    689
5                 Maninder Singh    673
6               Rishank Devadiga    619
7                Kashiling Adake    612
8                     Anup Kumar    596
9           Pawan Kumar Sehrawat    572
10              Manjeet Chhillar    562
11                Sandeep Narwal    533
12                    Monu Goyat    475
13                  Jang Kun Lee    462
14                 Sachin Tanwar    456
15                   Nitin Tomar    445
16                  Jasvir Singh    412
17                 Rajesh Narwal    397
18                  Sukesh Hegde    395
19                  Meraj Sheykh    393
20                  Naveen Kumar    364
21                Vikash Kandola    358
22           Prashanth Kumar Rai    358
23                  K. Prapanjan    357
24               Shrikant Jadhav    342
25        Siddharth Sirish Desai    337
26                     Ran Singh    319
27                Ravinder Pahal    317
28                 Deepak Narwal    306
29                   Wazir Singh    300
..                           ...    ...
359         Rohit Kumar Prajapat      1
360              Kazuhiro Takano      1
361             Inderpal Bishnoi      1
362                   Amit Kumar      1
363          Sunil Subhash Lande      1
364                  Atif Waheed      1
365                  Nithesh B R      1
366  Mohammad Taghi Paein Mahali      1
367                  Yong Joo Ok      1
368               Vishnu Uthaman      1
369               Ajvender Singh      1
370                        Sanju      1
371              Ravinandan G.M.      1
372                 Navjot Singh      1
373                Parvesh Attri      1
374                Hardeep Duhan      1
375               Parveen Narwal      1
376                   Ajay Singh      1
377                  Nitin Kumar      1
378                       Jishnu      1
379                Naveen Narwal      1
380                   M. Abishek      1
381               Vikas Chhillar      1
382                         Aman      1
383                      Satywan      1
384               Vikram Kandola      1
385             Emad Sedaghatnia      1
386                Aashish Nagar      1
387        Ajinkya Rohidas Kapre      1
388                       Munish      1

[389 rows x 2 columns]

推荐阅读