python - python beautifulsoup 下一页
问题描述
这是我当前从网站上抓取特定玩家数据的代码:
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import pandas as pd
from pandas import ExcelWriter
import lxml
import xlsxwriter
page = requests.get('https://www.futbin.com/players?page=1')
soup = BeautifulSoup(page.content, 'lxml')
pool = soup.find(id='repTb')
pnames = pool.find_all(class_='player_name_players_table')
pprice = pool.find_all(class_='ps4_color font-weight-bold')
prating = pool.select('span[class*="form rating ut20"]')
all_player_names = [name.getText() for name in pnames]
all_prices = [price.getText() for price in pprice]
all_pratings = [rating.getText() for rating in prating]
fut_data = pd.DataFrame(
{
'Player': all_player_names,
'Rating': all_pratings,
'Price': all_prices,
})
writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')
fut_data.to_excel(writer,'Futbin')
writer.save()
print(fut_data)
这适用于第一页。但是我总共需要浏览 609 页并从所有页面中获取数据。
您能帮我重新编写这段代码以使其正常工作吗?我还是新手,正在学习这个项目。
解决方案
您可以遍历所有609
页面,解析每个页面,最后将收集的数据保存到file.xlsx
:
import requests
from bs4 import BeautifulSoup
import pandas as pd
all_player_names = []
all_pratings = []
all_prices = []
for i in range(1, 610):
page = requests.get('https://www.futbin.com/players?page={}'.format(i))
soup = BeautifulSoup(page.content, 'lxml')
pool = soup.find(id='repTb')
pnames = pool.find_all(class_='player_name_players_table')
pprice = pool.find_all(class_='ps4_color font-weight-bold')
prating = pool.select('span[class*="form rating ut20"]')
all_player_names.extend([name.getText() for name in pnames])
all_prices.extend([price.getText() for price in pprice])
all_pratings.extend([rating.getText() for rating in prating])
fut_data = pd.DataFrame({'Player': all_player_names,
'Rating': all_pratings,
'Price': all_prices})
writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')
fut_data.to_excel(writer, 'Futbin')
writer.save()
推荐阅读
- python - 如何在python中绘制具有不同大小矩形的热图
- html - 位置 CSS 属性影响其子级的方式是什么?
- excel - 复制数据并移动数据的代码真的很慢
- html - type-of 和 type-of 为假,仍然传递为真
- azure-language-understanding - 一个拒绝显示意图的 LUIS 模型
- tizen-wearable-sdk - 是否可以从 samsung active 2 手表下载原始加速度计数据并保存为 csv?
- javascript - 在 Vue 方法挂钩中使用 d3.select(this) 时如何解决未定义的错误?
- reactjs - 似乎无法在 React 中的组件之间共享 useContext 或 useReducer
- javascript - Mongodb,NodeJS CPU不断增加
- amazon-web-services - 使用 boto3 创建 S3 生命周期策略时出现格式错误的 XML 错误