python-3.x - 如何从 BeautifulSoup 过滤器结果中删除 div
问题描述
现在我正在尝试从 BeautifulSoup 结果中删除 div 类,如下所示:
response = requests.get(url)
// success
cnbeta_article_content = BeautifulSoup(response.content, "html.parser").find("div", {"class": "cnbeta-article-body"})
// failed
removed_share_content = BeautifulSoup(cnbeta_article_content, "html.parse").find("div", {"class": "article-share-code"}).decompose()
result_text = removed_share_content.prettify()
return result_text
首先从类cnbeta-article-body
中获取 div,从过滤结果中删除 div article-share-code
,但它似乎不起作用。我应该怎么做才能解决它?这个网址是:https://www.cnbeta.com/articles/tech/1097507.htm
解决方案
div的htmlcnbeta-article-body
如下
<div class="cnbeta-article-body">
<div class="article-summary">
<div class="topic"><a href="https://www.cnbeta.com/topics/741.htm" target="_blank"><img src="https://static.cnbetacdn.com/topics/9a78aa447fb90ef.png" title="手机 - OnePlus ä¸€åŠ "/></a></div>
<p>ä¸€åŠ å³å°†å‘å¸ƒå¹´åº¦æ——èˆ°ä¸€åŠ 9系列,éšç€æ–°æ——舰的到æ¥ï¼Œä¸€åŠ 8ç³»åˆ—æœºåž‹ä»·æ ¼å¼€å§‹ä¸‹è°ƒã€‚ä»Šå¤©ï¼Œä¸€åŠ å®£å¸ƒï¼Œ<strong>ä¸€åŠ 8 Proæœ€é«˜ä¼˜æƒ 1000å
ƒï¼Œèµ·å”®ä»·åªè¦4599å
ƒï¼Œæ”¯æŒ24期å
æ¯åˆ†æœŸ</strong>,æä¾›é’空ã€é»‘é•œã€è“调三ç§é
色。</p> </div>
<div class="article-content" id="artibody">
<div class="article-global"><p><strong>访问:</strong></p><p><a href="https://click.aliyun.com/m/1000245338/" target="_blank"><strong><span style="color: rgb(192, 0, 0);">2021阿里云上云采è´å£ï¼šé‡‡è´è¡¥è´´ã€å
值返券ã€çˆ†æ¬¾æŠ¢å
ˆè´â€¦â€¦</span></strong></a></p></div> <div class="article-topic"><p>
<strong>访问è´ä¹°é¡µé¢:</strong>
</p>
<p>
<a href="https://c.duomai.com/track.php?site_id=242986&aid=942&euid=&t=http%3A%2F%2Fwww.oneplus.com" target="_blank">ä¸€åŠ è‡ªè¥æ——舰店</a>
</p></div><p style="text-align:center"><img src="https://static.cnbetacdn.com/article/2021/0304/8158fddd4c92c53.jpg"/></p><p style="text-align: left;">ä¸€åŠ 8 Pro最大的看点之一是å±å¹•ï¼Œ<strong>å
¶å±å¹•å°ºå¯¸ä¸º6.78英寸,分辨率为2K+,刷新率为120Hzï¼Œè§¦æŽ§é‡‡æ ·çŽ‡ä¸º240Hz,被称之为“å±å¹•æœºçš‡â€ã€‚</strong></p><p style="text-align: left;">DisplayMateè¯„ä»·ä¸€åŠ 8 Pro:<strong>教科书般完<a data-link="1" href="https://c.duomai.com/track.php?site_id=242986&euid=&t=https://mideajiadian.jd.com/" target="_blank">美的</a>æ ¡å‡†ç²¾åº¦å’Œæ€§èƒ½è¡¨çŽ°</strong>ï¼Œåˆ›é€ 13项智能<a data-link="1" href="https://c.duomai.com/track.php?site_id=242986&euid=&t=https://shouji.jd.com/" target="_blank">手机</a>显示记录。</p><p style="text-align: left;">è§„æ ¼æ–¹é¢ï¼Œä¸€åŠ 8 Proæ载高通éªé¾™865旗舰平å°ï¼Œå‰ç½®1600万åƒç´ ,åŽç½®4800万è¶
æ¸
å››æ‘„ï¼Œç”µæ± å®¹é‡ä¸º4510mAh,支æŒ30W Warpæ— çº¿é—ªå
ã€Warpé—ªå
30T有线å
电。</p><p style="text-align: left;">æ¤å¤–ï¼Œä¸€åŠ 8Tå
¨é¢çŽ°è´§å‘售,起售价3399å
ƒï¼Œä¸€åŠ 8é™è‡³3299å
ƒã€‚</p><p style="text-align: center;"><a href="https://static.cnbetacdn.com/article/2021/0304/cfdb4208167012e.jpg" target="_blank"><img src="https://static.cnbetacdn.com/thumb/article/2021/0304/cfdb4208167012e.jpg"/></a><a href="https://static.cnbetacdn.com/article/2021/0304/e7bb8bbd2e5b913.jpg" target="_blank"><img src="https://static.cnbetacdn.com/thumb/article/2021/0304/e7bb8bbd2e5b913.jpg"/></a><a href="https://static.cnbetacdn.com/article/2021/0304/2e6cad84f505f43.jpg" target="_blank"><img src="https://static.cnbetacdn.com/thumb/article/2021/0304/2e6cad84f505f43.jpg"/></a><a href="https://static.cnbetacdn.com/article/2021/0304/b77d443a3761049.jpg" target="_blank"><img src="https://static.cnbetacdn.com/thumb/article/2021/0304/b77d443a3761049.jpg"/></a><a href="https://static.cnbetacdn.com/article/2021/0304/ea9ebd51f33109f.jpg" target="_blank"><img src="https://static.cnbetacdn.com/thumb/article/2021/0304/ea9ebd51f33109f.jpg"/></a><a href="https://static.cnbetacdn.com/article/2021/0304/db97040429b984a.jpg" target="_blank"><img src="https://static.cnbetacdn.com/thumb/article/2021/0304/db97040429b984a.jpg"/></a><a href="https://static.cnbetacdn.com/article/2021/0304/6a9943e38585768.jpg" target="_blank"><img src="https://static.cnbetacdn.com/thumb/article/2021/0304/6a9943e38585768.jpg"/></a><a href="https://static.cnbetacdn.com/article/2021/0304/06ceef5e21085e6.jpg" target="_blank"><img src="https://static.cnbetacdn.com/thumb/article/2021/0304/06ceef5e21085e6.jpg"/></a><a href="https://static.cnbetacdn.com/article/2021/0304/c052adf0ce81f58.jpg" target="_blank"><img src="https://static.cnbetacdn.com/thumb/article/2021/0304/c052adf0ce81f58.jpg"/></a><a href="https://static.cnbetacdn.com/article/2021/0304/5e3036dd27cbd45.jpg" target="_blank"><img src="https://static.cnbetacdn.com/thumb/article/2021/0304/5e3036dd27cbd45.jpg"/></a></p> </div>
<div class="tac">
<div class="tal cbv"><script type="text/javascript"><!--
google_ad_client = "ca-pub-3507708728694406";
/* cnBeta.COM æ–‡ç« é¡µæ–‡æœ«é€šæ #1 */
google_ad_slot = "1385693419";
google_ad_width = 810;
google_ad_height = 100;
//-->
</script>
<script src="//pagead2.googlesyndication.com/pagead/show_ads.js" type="text/javascript">
</script></div>
<div class="tal cbv">
<a href="https://click.aliyun.com/m/1000245337/" target="_blank"><img src="https://static.cnbetacdn.com/article/2021/03/7bcc0f26b07694b.jpg"/></a>
</div>
<div class="tal cbv">
<script type="text/javascript"><!--
google_ad_client = "ca-pub-3507708728694406";
/* cnBeta.COM æ–‡ç« é¡µæ–‡æœ«é€šæ #2 */
google_ad_slot = "8489727379";
google_ad_width = 810;
google_ad_height = 100;
//-->
</script>
<script src="//pagead2.googlesyndication.com/pagead/show_ads.js" type="text/javascript">
</script>
</div>
<div class="cbv810">
<div class="left500"><script type="text/javascript">
(function() {
var s = "_" + Math.random().toString(36).slice(2);
document.write('<div style="" id="' + s + '"></div>');
(window.slotbydup = window.slotbydup || []).push({
id: "u4395341",
container: s
});
})();
</script><script async="async" defer="defer" src="//cpro.baidustatic.com/cpro/ui/c.js" type="text/javascript">
</script>
</div>
<div class="right300"><script type="text/javascript"><!--
google_ad_client = "ca-pub-3507708728694406";
/* cnBeta.COM V5 æ–‡ç« é¡µæ–‡æœ«ç”»ä¸ç”» #2 */
google_ad_slot = "5755245019";
google_ad_width = 300;
google_ad_height = 250;
//-->
</script>
<script src="//pagead2.googlesyndication.com/pagead/show_ads.js" type="text/javascript">
</script></div>
</div> </div>
<div class="article-share-code">
<div class="share-unit"><div class="share-btns bdsharebuttonbox"><a class="bds_tsina share-btn weibo" data-cmd="tsina" href="#" title="分享到新浪微åš">新浪微åš</a><a class="bds_qzone share-btn qzone" data-cmd="qzone" href="#" title="分享到QQ空间">QQ空间</a><a class="bds_tqq share-btn tqq" data-cmd="tqq" href="#" title="分享到è
¾è®¯å¾®åš">è
¾è®¯å¾®åš</a><a class="bds_sqq share-btn sqq" data-cmd="sqq" href="#" title="分享到QQ好å‹">QQ好å‹</a><a class="bds_weixin share-btn weixin" data-cmd="weixin" href="#" title="分享到微信">微信</a><a class="bds_douban share-btn douban" data-cmd="douban" href="#" title="分享到豆瓣网">豆瓣网</a><a class="bds_youdao share-btn youdao" data-cmd="youdao" href="#" title="分享到有é“云笔记">有é“云笔记</a><a class="bds_tieba share-btn tieba" data-cmd="tieba" href="#" title="分享到百度贴å§">百度贴å§</a><a class="bds_linkedin share-btn linkedin" data-cmd="linkedin" href="#" title="分享到linkedin">Linkedin</a><div class="more"></div></div></div>
<label><img src="//static.cnbetacdn.com/share/r2.gif"/></label>
</div>
<div class="article-global"></div> </div>
如果您观察,div
with classarticle-share-code
是一个子节点。如果删除父节点,所有子节点也会被删除。
因此,如果您运行以下代码,子节点也会被删除
res = requests.get("https://www.cnbeta.com/articles/tech/1097507.htm")
soup = BeautifulSoup(res.text)
soup.find("div", {"class": "cnbeta-article-body"}).decompose()
仅删除div
with 类article-share-code
检查以下代码
soup.find("div", {"class": "article-share-code"}).decompose()
推荐阅读
- asp.net-mvc - ASP.NET MVC - 复选框列表未返回所有逗号分隔值
- java - 我们不应该在写入 OutputStream 时跟踪偏移量吗?
- optimization - 什么优化算法更适合时间表重排?
- angular - 由于 REGEX,Angular 需要很长时间才能在浏览器中加载
- c# - KnockoutJS 不绑定已自动完成的字段
- node.js - 异步/等待 nodejs 功能
- android - 即使我没有使用静态我得到以下警告
- c++ - 在附加父级的 Json::Value 后附加子级的 Json::Value 不会更改父级数据,有什么建议吗?
- angular - 为什么 Angular 中的 Jasmine 间谍功能没有被回调触发?
- c# - 如何创建所有数字组合的n维数组?