python - 使用 BS4 进行网页抓取 - “传递值的长度为 0,索引意味着 7”
问题描述
我得到一个错误,即传递值 0 的长度?
这是我的代码:
import bs4 as bs
import urllib
import urllib.request
import pandas as pd
draft2018 ="https://en.wikipedia.org/wiki/2018_NBA_draft"
draftpage =urllib.request.urlopen(draft2018)
soup=bs.BeautifulSoup(draftpage,"html.parser")
columns = ['Round', 'Pick', 'Player', 'Position',
'Nationality', 'Team', 'School/club team']
df = pd.DataFrame(columns=columns)
table = soup.find("table",{"class":"wikitable sortable plainrowheaders"}).tbody
trs = table.find_all("tr")
for tr in trs:
tds = tr.find_all('td')
row = [td.text.replace('\n','') for td in tds]
df = df.append(pd.Series(row, index=columns), ignore_index=True)
有人可以解释这背后的原因吗?
解决方案
用于read_html
DataFrame 的返回列表并选择 4. DataFrame by indexing [3]
,然后rename
是 columns by dictionary:
draft2018 = "https://en.wikipedia.org/wiki/2018_NBA_draft"
d = {'Rnd.':'Round','Pos.':'Position','Nationality[n 1]':'Nationality'}
df = pd.read_html(draft2018)[3].rename(columns=d)
print(df.head())
Round Pick Player Position Nationality \
0 1 1 Deandre Ayton C Bahamas
1 1 2 Marvin Bagley III PF United States
2 1 3 Luka Dončić PG/SF Slovenia
3 1 4 Jaren Jackson Jr. PF United States
4 1 5 Trae Young PG United States
Team School / club team
0 Phoenix Suns Arizona (Fr.)
1 Sacramento Kings Duke (Fr.)
2 Atlanta Hawks (traded to Dallas)[a] Real Madrid (Spain)
3 Memphis Grizzlies Michigan State (Fr.)
4 Dallas Mavericks (traded to Atlanta)[a] Oklahoma (Fr.)
推荐阅读
- javascript - 保存拖动小部件的位置
- c++ - C++ 模板可以帮助定义单行“属性”成员吗?
- python - 向 python 引入文件时遇到问题
- javascript - I'm trying to get value of a text input but it doesn't work correctly
- sas - SAS大学版-本地主机10080真的很长时间初始化
- javascript - 从 ReactJS 中的 JSON 响应中提取一组数据
- mysql - 已安装连接器和 python,但 mysql 安装程序无法识别它们
- python - 在 settings.py Django 中保存 postgresql 服务器信息时出现语法错误
- javascript - 将内部 HTML 设置为无效 HTML 时如何保留原始 DOM 结构
- php - 无法将 AJAX POST 复制到 Postman