python - 美汤python中的find()和find_all()有什么区别?
问题描述
我正在做网络抓取,但我在 find() 和 find_all() 中卡住/混淆了。
比如在哪里使用 find_all,在哪里使用 find()。
另外,我在哪里可以使用这种方法,比如在for 循环或ul li list 中?
这是我尝试过的代码
from bs4 import BeautifulSoup
import requests
urls = "https://www.flipkart.com/offers-list/latest-launches?screen=dynamic&pk=themeViews%3DAug19-Latest-launch-Phones%3ADTDealcard~widgetType%3DdealCard~contentType%3Dneo&wid=7.dealCard.OMU_5&otracker=hp_omu_Latest%2BLaunches_5&otracker1=hp_omu_WHITELISTED_neo%2Fmerchandising_Latest%2BLaunches_NA_wc_view-all_5"
source = requests.get(urls)
soup = BeautifulSoup(source.content, 'html.parser')
divs = soup.find_all('div', class_='MDGhAp')
names = divs.find_all('a')
full_name = names.find_all('div', class_='iUmrbN').text
print(full_name)
并得到这样的错误
File "C:/Users/ASUS/Desktop/utube/sunil.py", line 9, in <module>
names = divs.find_all('a')
File "C:\Users\ASUS\AppData\Local\Programs\Python\Python38-32\lib\site-packages\bs4\element.py", line 1601, in __getattr__
raise AttributeError(
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
那么任何人都可以解释我应该在哪里使用find和find all吗?
解决方案
find() - 它只是在页面中找到搜索的元素时返回结果。返回类型将为<class 'bs4.element.Tag'>
.
find_all() - 它返回所有匹配项(即)它扫描整个文档并返回所有结果,返回类型将是<class 'bs4.element.ResultSet'>
from robobrowser import RoboBrowser
browser = RoboBrowser(history=True)
browser = RoboBrowser(parser='html.parser')
browser.open('http://www.stackoverflow.com')
res=browser.find('h3')
print(type(res),res)
print(" ")
res=browser.find_all('h3')
print(type(res),res)
print(" ")
print("Iterating the Resultset")
print(" ")
for x in range(0,len(res)):
print(x,res[x])
print(" ")
输出:
<class 'bs4.element.Tag'> <h3><a href="https://stackoverflow.com">current community</a>
</h3>
<class 'bs4.element.ResultSet'> [<h3><a href="https://stackoverflow.com">current community</a>
</h3>, <h3>
your communities </h3>, <h3><a href="https://stackexchange.com/sites">more stack exchange communities</a>
</h3>, <h3 class="w90 mx-auto ta-center p-ff-roboto-slab-bold fs-headline2 mb24">Questions are everywhere, answers are on Stack Overflow</h3>, <h3 class="w90 mx-auto ta-center p-ff-roboto-slab-bold fs-headline2 mb24">Learn and grow with Stack Overflow</h3>, <h3 class="mx-auto w90 wmx12 p-ff-roboto-slab-bold fs-headline2 mb24 lg:ta-center">Looking for a job?</h3>]
Iterating the Resultset
0 <h3><a href="https://stackoverflow.com">current community</a>
</h3>
1 <h3>
your communities </h3>
2 <h3><a href="https://stackexchange.com/sites">more stack exchange communities</a>
</h3>
3 <h3 class="w90 mx-auto ta-center p-ff-roboto-slab-bold fs-headline2 mb24">Questions are everywhere, answers are on Stack Overflow</h3>
4 <h3 class="w90 mx-auto ta-center p-ff-roboto-slab-bold fs-headline2 mb24">Learn and grow with Stack Overflow</h3>
5 <h3 class="mx-auto w90 wmx12 p-ff-roboto-slab-bold fs-headline2 mb24 lg:ta-center">Looking for a job?</h3>
推荐阅读
- python - Google PubSub 在阻塞和等待消息时没有标准输出
- python-3.x - python:打印最近定义的变量的任何方式
- c# - 获取排名的 linq 问题
- r - 如果所有值都是某个数字,则从数据框中删除一个组?
- javascript - Ajax request to php with empty POST data
- javascript - 添加空格以输入 ABN(澳大利亚商业号码)值的文本,因为用户将值输入到 Angular 11 反应式表单中的输入中
- computer-vision - ImportError:gpu_nmscpython-36m-x86_64-linux-gnu.so:未定义符号:__cudaPopCallConfiguration
- flutter - 等待的最佳方式 [Flutter]
- python - PyCharm 新的 python 项目,带有带有虚拟环境的新解释器
- node.js - 在 Node.js 中创建一个持久的 bash shell 会话,知道命令何时完成,并读取和修改源/导出的变量