首页 > 解决方案 > 使用循环进行 Webscraping 仅返回单个元素

问题描述

当我运行一个 for 循环来收集<div>标签中的元素时,它只返回所有具有相同class.

例如:

r = requests.get("https://one-versus-one.com/en/rankings/all/statistics")

soup = BeautifulSoup(r.content, 'lxml')

data = {
    'players': [],
    'club': [],
    'rank': []
}
def getstuff(soup):
    products = soup.find_all('div', {'class':'rankings-table'})
    for name in products:
        players = name.find('div', {'class':'player-name rankings-table__player-name'}).text
        club = name.find('span', {'class':'rankings-table__club-name'}).text
        rank = name.find('div', {'class':'rankings-table-cell value rankings-table__value'}).text.strip()
        data['players'] = players
        data['club'] = club
        data['rank'] = rank
        print(data)

getstuff(soup)

这将返回:

{“球员”:“莱昂内尔·梅西”,“俱乐部”:“巴塞罗那”,“排名”:“100”}

我希望所有球员、俱乐部和排名都打印在页面内。

标签: pythonweb-scraping

解决方案


你可以试试这个:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://one-versus-one.com/en/rankings/all/statistics")
soup = BeautifulSoup(r.content, 'lxml')

data = {'players': [],'club': [],'rank': []}

def getstuff(soup):
    products = soup.find('div', {'class':'rankings-table'}).find_all("a")
    for name in products:
        players = name.find('div', {'class':'player-name rankings-table__player-name'}).text
        club = name.find('span', {'class':'rankings-table__club-name'}).text
        rank = name.find('div', {'class':'rankings-table-cell value rankings-table__value'}).text.strip()
        data['players'].append(players)
        data['club'].append(club)
        data['rank'].append(rank)
    print(data)

getstuff(soup)
"""
{'players': ['Lionel Messi', 'Junior Neymar', 'Robert Lewandowski', 'Joao Cancelo', 'Kevin de Bruyne', 'Rodri', 'Jesse Lingard', 'Riyad Mahrez', 'Ilkay Gundogan', 'John Stones'], 'club': ['Barcelona', 'Paris Saint-Germain', 'Bayern Munich', 'Manchester City', 'Manchester City', 'Manchester City', 'West Ham United', 'Manchester City', 'Manchester City', 'Manchester City'], 'rank': ['100', '95', '93', '92', '91', '90', '90', '89', '88', '88']}
"""

您必须使用.find_all("a")来获取有关所有玩家的信息。另外,您只是在添加新球员data['players']时添加新球员,而对于俱乐部,排名相同。


推荐阅读