首页 > 解决方案 > 自动搜索列表和抓取表

问题描述

我想自动化网站上的搜索过程并抓取单个玩家的表格(我从 Excel 表中获取玩家的姓名)。我想将抓取的信息添加到现有的 Excel 表格中,其中包含玩家列表。对于该球员进入联盟的每一年,该球员的名字都需要在第一列。到目前为止,我能够从现有的 Excel 表中获取信息,但我不确定如何使用它来自动化搜索过程。我不确定 Selenium 是否可以提供帮助。该网站是https://basketball.realgm.com/

import openpyxl

path = r"C:\Users\Name\Desktop\NBAPlayers.xlsx"

workbook = openpyxl.load_workbook(path)

sheet = workbook.active

rows = sheet.max_row
cols = sheet.max_column

print(rows)
print(cols)

for r in range(2, rows+1):
    for c in range(2,cols+1):
        print(sheet.cell(row=r,column=c).value, end=" ")

    print()

标签: pythonpandasseleniumbeautifulsoupopenpyxl

解决方案


我想您已经从 excel 表中获得了名称,所以我使用了名称list并使用 pythonrequest模块并获取页面文本,然后用于beautiful soup获取表格内容,然后我已经使用pandasdataframe.

代码

import requests
import pandas as pd
from bs4 import BeautifulSoup
playernames=['Dominique Jones', 'Joe Young', 'Darius Adams', 'Lester Hudson', 'Marcus Denmon', 'Courtney Fortson']

for name in playernames:
  fname=name.split(" ")[0]
  lname=name.split(" ")[1]
  url="https://basketball.realgm.com/search?q={}+{}".format(fname,lname)
  print(url)
  r=requests.get(url)
  soup=BeautifulSoup(r.text,'html.parser')
  table=soup.select_one(".tablesaw ")
  dfs=pd.read_html(str(table))
  for df in dfs:
      print(df)

输出

https://basketball.realgm.com/search?q=Dominique+Jones
            Player Pos   HT  ...  Draft Year          College               NBA
0  Dominique Jones   G  6-4  ...        2010    South Florida  Dallas Mavericks
1  Dominique Jones   G  6-2  ...        2009          Liberty                 -
2  Dominique Jones  PG  5-9  ...        2011  Fort Hays State                 -

[3 rows x 8 columns]
https://basketball.realgm.com/search?q=Joe+Young
      Player Pos   HT  ... Draft Year           College             NBA
0  Joe Young   F  6-6  ...       2007        Holy Cross               -
1  Joe Young   G  6-0  ...       2009          Canisius               -
2  Joe Young   G  6-2  ...       2015            Oregon  Indiana Pacers
3  Joe Young   G  6-2  ...       2009  Central Missouri               -

[4 rows x 8 columns]
https://basketball.realgm.com/search?q=Darius+Adams
         Player Pos   HT  ...  Draft Year              College  NBA
0  Darius Adams  PG  6-1  ...        2011         Indianapolis    -
1  Darius Adams   G  6-0  ...        2018  Coast Guard Academy    -

[2 rows x 8 columns]
https://basketball.realgm.com/search?q=Lester+Hudson
      Season       Team  GP  GS   MIN  ...   STL   BLK    PF   TOV    PTS
0  2009-10 *  All Teams  25   0   5.3  ...  0.32  0.12  0.48  0.56   2.32
1  2009-10 *        BOS  16   0   4.4  ...  0.19  0.12  0.44  0.56   1.38
2  2009-10 *        MEM   9   0   6.8  ...  0.56  0.11  0.56  0.56   4.00
3    2010-11        WAS  11   0   6.7  ...  0.36  0.09  0.91  0.64   1.64
4  2011-12 *  All Teams  16   0  20.9  ...  0.88  0.19  1.62  2.00  10.88
5  2011-12 *        CLE  13   0  24.2  ...  1.08  0.23  2.00  2.31  12.69
6  2011-12 *        MEM   3   0   6.5  ...  0.00  0.00  0.00  0.67   3.00
7    2014-15        LAC   5   0  11.1  ...  1.20  0.20  0.80  0.60   3.60
8     CAREER        NaN  57   0  10.4  ...  0.56  0.14  0.91  0.98   4.70

[9 rows x 23 columns]
https://basketball.realgm.com/search?q=Marcus+Denmon
    Season Team        Location  GP  GS  ...  STL  BLK    PF   TOV    PTS
0  2012-13  SAN       Las Vegas   5   0  ...  0.4  0.0  1.60  0.20   5.40
1  2013-14  SAN       Las Vegas   5   1  ...  0.8  0.0  2.20  1.20  10.80
2  2014-15  SAN       Las Vegas   6   2  ...  0.5  0.0  1.50  0.17   5.00
3  2015-16  SAN  Salt Lake City   2   0  ...  0.0  0.0  0.00  0.00   0.00
4   CAREER  NaN             NaN  18   3  ...  0.5  0.0  1.56  0.44   6.17

[5 rows x 24 columns]
https://basketball.realgm.com/search?q=Courtney+Fortson
      Season       Team  GP  GS   MIN   FGM  ...   AST  STL  BLK    PF   TOV   PTS
0  2011-12 *  All Teams  10   0   9.5  1.10  ...  1.00  0.3  0.0  0.50  1.00  3.50
1  2011-12 *        HOU   6   0   8.2  1.00  ...  0.83  0.5  0.0  0.33  0.83  3.00
2  2011-12 *        LAC   4   0  11.5  1.25  ...  1.25  0.0  0.0  0.75  1.25  4.25
3     CAREER        NaN  10   0   9.5  1.10  ...  1.00  0.3  0.0  0.50  1.00  3.50

[4 rows x 23 columns]

推荐阅读