python-3.x - 无法使用漂亮的汤刮掉所有数据
问题描述
URL = r"https://www.vault.com/best-companies-to-work-for/law/top-100-law-firms-rankings/year/"
My_list = ['2007','2008','2009','2010']
Year = []
CompanyName = []
Rank = []
Score = []
for I, Page in enumerate(My_list, start=1):
url = r'https://www.vault.com/best-companies-to-work-for/law/top-100-law-firms-rankings/year/{}'.format(Page)
print(url)
Res = requests.get(url)
soup = BeautifulSoup(Res.content , 'html.parser')
data = soup.find('div' ,{'id':'main-content'})
for Data in data:
Title = data.findAll('h3')
for title in Title:
CompanyName.append(title.text.strip())
Rank = data.findAll('div' ,{'class':'rank RankNumber'})
for rank in Rank:
Rank.append(rank)
Score = data.findAll('div' ,{'class':'rank RankNumber'})
for score in Score:
Score.append(score)
我无法获得标题、排名、分数的所有数据。我不知道我是否确定了正确的标签。我无法从列表排名中提取价值。
解决方案
让你开始。首先,找到所有 div.RankItem 元素,然后在每个元素中找到标题、排名和分数。
from bs4 import BeautifulSoup
import requests
resp = requests.get('https://www.vault.com/best-companies-to-work-for/law/top-100-law-firms-rankings/year/2010')
soup = BeautifulSoup(resp.content , 'html.parser')
for i, item in enumerate(soup.find_all("div", {"class": "RankItem"})):
title = item.find("h3", {"class": "MainLink"}).get_text().strip()
rank = item.find("div", {"class": "RankNumber"}).get_text().strip()
score = item.find("div", {"class": "score"}).get_text().strip()
print(i+1, title, rank, score)
推荐阅读
- r - 如何重新排序向量,使其与 R 中另一个向量的顺序相匹配?
- python - pytorch 训练函数变量和张量(阅读我的介绍,我也不知道我的问题,它只是不起作用)
- xml - 忽略空的 xml 标签
- python - TypeError:'dict'和'dict'的实例之间不支持'<':Python 3 Conversion?
- android - 如何禁用 GridView 被 PopupWindow 滚动
- php - Facade\Ignition\Exceptions\ViewException?
- javascript - 如何获取cherrypy函数的结果?
- apify - 如果运行成功,则下载最新结果
- java - ArrayList 错误:大小相同,在 remove() 之后包含“空”元素
- android - 处理 viewModel 上的内部 Transformations.switchMap