首页 > 解决方案 > Web Scraping - transfermarkt 最有价值的玩家

问题描述

我是网络抓取的新手。

我在这段代码中找不到我的错:

import requests
import csv
from bs4 import BeautifulSoup
url = "https://www.transfermarkt.co.uk/spieler- 
statistik/wertvollstespieler/marktwertetop"
response=requests.get(url)
html_icerigi=response.content
soup=BeautifulSoup(html_icerigi,"html.parser")
footballer = soup.find_all("a",{"class":"spielprofil_tooltip tooltipstered"})
footballer_list=[]
for footballer in footballer_list:
   footballer=footballer.text
    footballer=footballer.strip()
    footballer=footballer.replace("\n","")
    footballer_list.append(["Futbolcu:{}".format(footballer)])
print(footballer_list)

标签: pythonbeautifulsouppython-requests

解决方案


它可以BeautifulSoup解决问题

  1. 有反爬虫,需要设置请求用户代理

  2. tooltipstered是动态附加的,您可以将其删除。

  3. 使用response.text而不是转义字符串response.content

  4. 您正在迭代空列表而不是a元素列表

    footballer_list=[]
    for footballer in footballer_list:
    
  5. 不需要的多行变量重写,可能是错误的列表树,你的意思是要附加 dict 而不是

    [['Futbolcu:Kylian Mbappé'], ......, ['Futbolcu:Marlon Freitas']]
    

固定代码:

import requests
import csv
from bs4 import BeautifulSoup

url = "https://www.transfermarkt.co.uk/spieler-statistik/wertvollstespieler/marktwertetop"
heads = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'}
response = requests.get(url, headers=heads)
html_icerigi = response.text
soup = BeautifulSoup(html_icerigi, "html.parser")
footballers = soup.find_all("a",{"class":"spielprofil_tooltip"})
footballer_list = []
for footballer in footballers:
    footballer_list.append({"Futbolcu" : footballer.text.strip()})

print(footballer_list)
print(footballer_list[5]["Futbolcu"])

结果:

[
 {'Futbolcu': 'Kylian Mbappé'}, 
 ......., 
 {'Futbolcu': 'Marlon Freitas'}
]

推荐阅读