python - 自动搜索列表和抓取表
问题描述
我想自动化网站上的搜索过程并抓取单个玩家的表格(我从 Excel 表中获取玩家的姓名)。我想将抓取的信息添加到现有的 Excel 表格中,其中包含玩家列表。对于该球员进入联盟的每一年,该球员的名字都需要在第一列。到目前为止,我能够从现有的 Excel 表中获取信息,但我不确定如何使用它来自动化搜索过程。我不确定 Selenium 是否可以提供帮助。该网站是https://basketball.realgm.com/。
import openpyxl
path = r"C:\Users\Name\Desktop\NBAPlayers.xlsx"
workbook = openpyxl.load_workbook(path)
sheet = workbook.active
rows = sheet.max_row
cols = sheet.max_column
print(rows)
print(cols)
for r in range(2, rows+1):
for c in range(2,cols+1):
print(sheet.cell(row=r,column=c).value, end=" ")
print()
解决方案
我想您已经从 excel 表中获得了名称,所以我使用了名称list
并使用 pythonrequest
模块并获取页面文本,然后用于beautiful soup
获取表格内容,然后我已经使用pandas
在dataframe
.
代码:
import requests
import pandas as pd
from bs4 import BeautifulSoup
playernames=['Dominique Jones', 'Joe Young', 'Darius Adams', 'Lester Hudson', 'Marcus Denmon', 'Courtney Fortson']
for name in playernames:
fname=name.split(" ")[0]
lname=name.split(" ")[1]
url="https://basketball.realgm.com/search?q={}+{}".format(fname,lname)
print(url)
r=requests.get(url)
soup=BeautifulSoup(r.text,'html.parser')
table=soup.select_one(".tablesaw ")
dfs=pd.read_html(str(table))
for df in dfs:
print(df)
输出:
https://basketball.realgm.com/search?q=Dominique+Jones
Player Pos HT ... Draft Year College NBA
0 Dominique Jones G 6-4 ... 2010 South Florida Dallas Mavericks
1 Dominique Jones G 6-2 ... 2009 Liberty -
2 Dominique Jones PG 5-9 ... 2011 Fort Hays State -
[3 rows x 8 columns]
https://basketball.realgm.com/search?q=Joe+Young
Player Pos HT ... Draft Year College NBA
0 Joe Young F 6-6 ... 2007 Holy Cross -
1 Joe Young G 6-0 ... 2009 Canisius -
2 Joe Young G 6-2 ... 2015 Oregon Indiana Pacers
3 Joe Young G 6-2 ... 2009 Central Missouri -
[4 rows x 8 columns]
https://basketball.realgm.com/search?q=Darius+Adams
Player Pos HT ... Draft Year College NBA
0 Darius Adams PG 6-1 ... 2011 Indianapolis -
1 Darius Adams G 6-0 ... 2018 Coast Guard Academy -
[2 rows x 8 columns]
https://basketball.realgm.com/search?q=Lester+Hudson
Season Team GP GS MIN ... STL BLK PF TOV PTS
0 2009-10 * All Teams 25 0 5.3 ... 0.32 0.12 0.48 0.56 2.32
1 2009-10 * BOS 16 0 4.4 ... 0.19 0.12 0.44 0.56 1.38
2 2009-10 * MEM 9 0 6.8 ... 0.56 0.11 0.56 0.56 4.00
3 2010-11 WAS 11 0 6.7 ... 0.36 0.09 0.91 0.64 1.64
4 2011-12 * All Teams 16 0 20.9 ... 0.88 0.19 1.62 2.00 10.88
5 2011-12 * CLE 13 0 24.2 ... 1.08 0.23 2.00 2.31 12.69
6 2011-12 * MEM 3 0 6.5 ... 0.00 0.00 0.00 0.67 3.00
7 2014-15 LAC 5 0 11.1 ... 1.20 0.20 0.80 0.60 3.60
8 CAREER NaN 57 0 10.4 ... 0.56 0.14 0.91 0.98 4.70
[9 rows x 23 columns]
https://basketball.realgm.com/search?q=Marcus+Denmon
Season Team Location GP GS ... STL BLK PF TOV PTS
0 2012-13 SAN Las Vegas 5 0 ... 0.4 0.0 1.60 0.20 5.40
1 2013-14 SAN Las Vegas 5 1 ... 0.8 0.0 2.20 1.20 10.80
2 2014-15 SAN Las Vegas 6 2 ... 0.5 0.0 1.50 0.17 5.00
3 2015-16 SAN Salt Lake City 2 0 ... 0.0 0.0 0.00 0.00 0.00
4 CAREER NaN NaN 18 3 ... 0.5 0.0 1.56 0.44 6.17
[5 rows x 24 columns]
https://basketball.realgm.com/search?q=Courtney+Fortson
Season Team GP GS MIN FGM ... AST STL BLK PF TOV PTS
0 2011-12 * All Teams 10 0 9.5 1.10 ... 1.00 0.3 0.0 0.50 1.00 3.50
1 2011-12 * HOU 6 0 8.2 1.00 ... 0.83 0.5 0.0 0.33 0.83 3.00
2 2011-12 * LAC 4 0 11.5 1.25 ... 1.25 0.0 0.0 0.75 1.25 4.25
3 CAREER NaN 10 0 9.5 1.10 ... 1.00 0.3 0.0 0.50 1.00 3.50
[4 rows x 23 columns]
推荐阅读
- c++ - Cpp中的方法(void)怎么了?
- r - 如果观察到警告,则在 for 循环 [R] 中跳过迭代
- java - 如何告诉杰克逊在反序列化时忽略某些字段
- shopify - 如何将自定义 html 块添加到页脚选项列表中?
- python - 删除从 API python 中提取的部分引用
- php - 如何在 Laravel 路由页面中使用会话数据(auth)
- mockito - 使用 mockito-inline 库对本地创建的对象进行模拟方法调用
- flutter - 方法 2 不填充名字
- quadratic-programming - 在 docplex 中添加二次约束
- azure-aks - 未启动运行状况检查的 Azure Kubernetes 服务 pod 超时且没有错误日志