python - 如何限制收集的数据,以免收集到它下面的整个元素树?
问题描述
这样,在刮的时候,href
它也最终会收集到下面的层,例如level-3
,但我想专门收集level-2
,我应该修改什么以免发生这种情况?
这是网站:
https ://int.soccerway.com/international/europe/european-championships/2020/group-stage/r38188/
部分注释代码:
ls = soup.find('ul', class_='level-2').findAll('li')
for i in ls:
print(i.find('a')['href'])
print('\n')
完整代码:
import bs4 as bs
import requests
url = 'https://int.soccerway.com/international/europe/european-championships/2020/group-stage/r38188/'
headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"}
resp = requests.get(url, headers=headers)
soup = bs.BeautifulSoup(resp.text, 'lxml')
ls = soup.find('ul', class_='level-2').findAll('li')
for i in ls:
print(i.find('a')['href'])
print('\n')
预期输出:
/international/europe/european-championships/2020/group-stage/r38188/
/international/europe/european-championships/2020/s13030/final-stages/
解决方案
由于您只想要特定标签的直接子代,因此只需添加参数 <li>
recursive=False
import bs4 as bs
import requests
url = 'https://int.soccerway.com/international/europe/european-championships/2020/group-stage/r38188/'
headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"}
resp = requests.get(url, headers=headers)
soup = bs.BeautifulSoup(resp.text, 'lxml')
ls = soup.find('ul', class_='level-2').findAll('li',recursive=False)
for i in ls:
print(i.find('a')['href'])
print('\n')
输出:
/international/europe/european-championships/2020/group-stage/r38188/
/international/europe/european-championships/2020/s13030/final-stages/
推荐阅读
- java - 在 ArrayList 上按降序实现 QuickSort
- flutter - 如何进行复杂查询以计算与 Firestore 上的查询匹配的嵌套对象?
- python - Python无法录制音频
- python - 加入后条件下 SQL 更新的 Pandas 等效项
- google-apps-script - 基于单元格值的 Google 表格保护
- r - 如何在个位数英寸值的开头添加 0?
- android - BottomAppBar 缺少菜单项 - 未显示所有菜单项
- python - 在列表理解中进行分配
- xcode - Xcode 11.2.1 未在库中显示对象
- node.js - 传递的函数不能很好地序列化