python - Web Scraping - transfermarkt 最有价值的玩家
问题描述
我是网络抓取的新手。
我在这段代码中找不到我的错:
import requests
import csv
from bs4 import BeautifulSoup
url = "https://www.transfermarkt.co.uk/spieler-
statistik/wertvollstespieler/marktwertetop"
response=requests.get(url)
html_icerigi=response.content
soup=BeautifulSoup(html_icerigi,"html.parser")
footballer = soup.find_all("a",{"class":"spielprofil_tooltip tooltipstered"})
footballer_list=[]
for footballer in footballer_list:
footballer=footballer.text
footballer=footballer.strip()
footballer=footballer.replace("\n","")
footballer_list.append(["Futbolcu:{}".format(footballer)])
print(footballer_list)
解决方案
它可以BeautifulSoup
解决问题
有反爬虫,需要设置请求用户代理
类
tooltipstered
是动态附加的,您可以将其删除。使用
response.text
而不是转义字符串response.content
。您正在迭代空列表而不是
a
元素列表footballer_list=[] for footballer in footballer_list:
不需要的多行变量重写,可能是错误的列表树,你的意思是要附加 dict 而不是
[['Futbolcu:Kylian Mbappé'], ......, ['Futbolcu:Marlon Freitas']]
固定代码:
import requests
import csv
from bs4 import BeautifulSoup
url = "https://www.transfermarkt.co.uk/spieler-statistik/wertvollstespieler/marktwertetop"
heads = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'}
response = requests.get(url, headers=heads)
html_icerigi = response.text
soup = BeautifulSoup(html_icerigi, "html.parser")
footballers = soup.find_all("a",{"class":"spielprofil_tooltip"})
footballer_list = []
for footballer in footballers:
footballer_list.append({"Futbolcu" : footballer.text.strip()})
print(footballer_list)
print(footballer_list[5]["Futbolcu"])
结果:
[
{'Futbolcu': 'Kylian Mbappé'},
.......,
{'Futbolcu': 'Marlon Freitas'}
]
推荐阅读
- javascript - Javascript,将复选框与 JSON 匹配
- python - 如何解决“ImmutableDenseNDimArray”对象没有属性“could_extract_minus_sign”?
- z3 - 证明溢出检查表达式是正确的
- c# - VB.NET Service.OnStart() 从不调用
- android - Android Material Component BottomNavigation 顶部阴影放置在底部时与居中时不同
- c - 当我用新指针分配指针而没有空闲时,这是不好的做法吗?
- android - 应用插件如何在 gradle 中工作
- sql - 创建单独表的权宜之计
- angular - Angular 7 iframe 历史后退按钮问题
- amazon-web-services - 覆盖子模块的资源