首页 > 解决方案 > BeautifulSoup 从一个标签中获取文本,但忽略另一个标签中的文本



        <title>What Music Do You Build Robots to?</title>
        <dc:creator><![CDATA[@TaranMayer TaranMayer ]]></dc:creator>
        <description><![CDATA[ <aside class="quote no-group" data-username="DanMantz" data-post="34" data-topic="84065" data-full="true">
<div class="title">
<div class="quote-controls"></div>
<img alt="" width="20" height="20" src="https://www.vexforum.com/user_avatar/www.vexforum.com/danmantz/40/2285_2.png" class="avatar"> DanMantz:</div>
<p>Classic Rock and Motown. I didn’t even consider that there are other options… <img src="https://www.vexforum.com/images/emoji/apple/slight_smile.png?v=9" title=":slight_smile:" class="emoji" alt=":slight_smile:"></p>
<p>This implies that you do indeed build robots. May we see some of your creations?</p> ]]></description>
        <pubDate>Wed, 02 Sep 2020 17:24:19 +0000</pubDate>
        <guid isPermaLink="false">www.vexforum.com-post-669073</guid>

使用 bs4,我想获取标签中所有内容的文本,但<description>标签中的内容除外<blockquote>。我想得到这个:

This implies that you do indeed build robots. May we see some of your creations?


标签: pythonbeautifulsouptags



from bs4 import BeautifulSoup, CData

txt = """<item>
        <title>What Music Do You Build Robots to?</title>
        <dc:creator><![CDATA[@TaranMayer TaranMayer ]]></dc:creator>
        <description><![CDATA[ <aside class="quote no-group" data-username="DanMantz" data-post="34" data-topic="84065" data-full="true">
<div class="title">
<div class="quote-controls"></div>
<img alt="" width="20" height="20" src="https://www.vexforum.com/user_avatar/www.vexforum.com/danmantz/40/2285_2.png" class="avatar"> DanMantz:</div>
<p>Classic Rock and Motown. I didn’t even consider that there are other options… <img src="https://www.vexforum.com/images/emoji/apple/slight_smile.png?v=9" title=":slight_smile:" class="emoji" alt=":slight_smile:"></p>
<p>This implies that you do indeed build robots. May we see some of your creations?</p> ]]></description>
        <pubDate>Wed, 02 Sep 2020 17:24:19 +0000</pubDate>
        <guid isPermaLink="false">www.vexforum.com-post-669073</guid>

# load main soup:
soup = BeautifulSoup(txt, "html.parser")

# find CData in description
desc = soup.find("description").find_next(text=lambda t: isinstance(t, CData))
# create new soup
desc = BeautifulSoup(desc, "html.parser")

# extract tags we don't want
for a in desc.select("aside"):

# print the text:


This implies that you do indeed build robots. May we see some of your creations?
