python - 了解如何使用 beautifulsoup find() 提取特定 div 中 html 中的所有元素
问题描述
这是我正在使用的 URL 。
我正在尝试Username
使用soup.find()
. 我不确定如何引用它div
,因为我用 is 找到的最后div
一个是soup.find("div", {"id": "sort-by"}).contents
返回:
['\n',
<div id="sort-by-container">
<div id="sort-by-current"><i aria-hidden="true" class="fa fa-sort"></i> <span id="sort-by-current-title">Sorted by: Followers</span></div>
<div class="border-box no-select" id="sort-by-dropdown">
<div class="sort-by-select" data-sort="most-followers" data-title="Sorted by: Followers">Sort by Followers</div>
<div class="sort-by-select" data-sort="most-following" data-title="Sorted by: Following">Sort by Following</div>
<div class="sort-by-select" data-sort="most-uploads" data-title="Sorted by: Uploads">Sort by Uploads</div>
<div class="sort-by-select" data-sort="most-likes" data-title="Sorted by: Likes">Sort by Likes</div>
</div>
</div>,
'\n',
<div style="clear: both;"></div>]
最终,我试图获取 username 下的每一行charli d’amelio
,addison rae
或者 `<a href""> 的内容
这是我到目前为止绑定的完整代码:
from bs4 import BeautifulSoup
with open('Top 50 TikTok users sorted by Followers - Socialblade TikTok Stats _ TikTok Statistics.html') as file:
soup = BeautifulSoup(file)
soup.find('title').contents
soup.find("div", {"id": "sort-by"}).contents
解决方案
要查找“用户名”列下的所有名称,您可以使用:nth-of-type(n)
CSS 选择器:div div:nth-of-type(n+5) > div > a
。
要使用 CSS 选择器,请使用.select()
方法而不是.find_all()
.
在您的示例中:
from bs4 import BeautifulSoup
with open("file.html", "r", encoding="utf-8") as file:
soup = BeautifulSoup(str(file.readlines()), "html.parser")
for tag in soup.select("div div:nth-of-type(n+5) > div > a"):
print(tag.text)
输出:
charli d’amelio
addison rae
Bella Poarch
Zach King
TikTok
...