首页 > 解决方案 > 如何在我正在抓取的 div 中抓取元素?

问题描述

我无法在 div 中打印元素。所以这是我要抓取的标签

div class="page-box house-lst-page-box" comp-module="page" page-url="/ershoufang/miyun/pg{page}" page-data="{"totalPage":73," curPage":1}"

我希望我的代码打印 totalPage 中的整数,即 73。

提前致谢!

标签: pythonweb-scrapingbeautifulsoup

解决方案


尝试:

import json
from bs4 import BeautifulSoup

html_doc = """<div class="page-box house-lst-page-box" comp-module="page" page-url="/ershoufang/miyun/pg{page}" page-data="{&quot;totalPage&quot;:73,&quot;curPage&quot;:1}"><a class="on" href="/ershoufang/miyun/" data-page="1">1</a><a href="/ershoufang/miyun/pg2" data-page="2">2</a><a href="/ershoufang/miyun/pg3" data-page="3">3</a><span>...</span><a href="/ershoufang/miyun/pg73" data-page="73">73</a><a href="/ershoufang/miyun/pg2" data-page="2">下一页&lt;/a></div>"""

soup = BeautifulSoup(html_doc, "html.parser")

data = soup.select_one("div[page-data]")["page-data"]
data = json.loads(data)

print("Total page:", data["totalPage"])

印刷:

Total page: 73

推荐阅读