首页 > 解决方案 > Beautifulsoup 只返回 100 个元素

问题描述

我是网络抓取的新手,想从 spotrac 抓取玩家姓名和薪水以用于大学项目。我迄今为止所做的如下。

import requests
from bs4 import BeautifulSoup   

URL = 'https://www.spotrac.com/nfl/rankings/'

reqs = requests.get(URL)
soup = BeautifulSoup(reqs.text, 'lxml')
print("List of all the h1, h2, h3 :")
for my_tag in soup.find_all(class_="team-name"):
    print(my_tag.text)

for my_tag in soup.find_all(class_="info"):
    print(my_tag.text)    

这个输出只有 100 个名称,但页面有 1000 个元素。这是有原因的吗?

标签: pythonweb-scrapingbeautifulsoup

解决方案


要获取所有名称和其他信息,请进行 Ajax POST 调用https://www.spotrac.com/nfl/rankings/

import requests
from bs4 import BeautifulSoup


url = 'https://www.spotrac.com/nfl/rankings/'
data = {
    'ajax': 'true',
    'mobile': 'false'
}

soup = BeautifulSoup(requests.post(url, data=data).content, 'html.parser')
for h3 in soup.select('h3'):
    print(h3.text)
    print(h3.find_next(class_="rank-value").text)
    print('-' * 80)

印刷:

Dak Prescott
$31,409,000  
--------------------------------------------------------------------------------
Russell Wilson
$31,000,000  
--------------------------------------------------------------------------------


...all the way to


--------------------------------------------------------------------------------
Willie Gay Jr.
$958,372  
--------------------------------------------------------------------------------
Jace Sternberger
$956,632  
--------------------------------------------------------------------------------

推荐阅读