首页 > 解决方案 > 在 BeautifulSoup 中寻找标签的内容,但它返回空白

问题描述

我试图解析的 xml 如下所示:

<item>
<title>Port on brain, some functions not working</title>
<dc:creator>
<![CDATA[ @nathankmiles Nathan ]]>
</dc:creator>
<description>
<![CDATA[ <p>Sorry, I thought we had already included the code that would be needed. Here is what we have been using for testing. In the code below, the problem is on Port 1 (LeftFrontDriveMotor).</p> <p>Here is the code from main.cpp.</p> <p><span class="hashtag">#include</span> “vex.h”&lt;br> <span class="hashtag">#include</span> “robot-config.h”&lt;/p> <p>using namespace vex;</p> <p>competition Competiton;</p> <p>void leftDrive() {<br> LeftFrontDriveMotor.spin(directionType::fwd, Controller1.Axis3.value(),velocityUnits::pct);<br> LeftBackDriveMotor.spin(directionType::fwd, Controller1.Axis3.value(),velocityUnits::pct);<br> }</p> <p>void pre_auton( void ) {<br> // Initializing Robot Configuration. DO NOT REMOVE!<br> vexcodeInit();<br> }</p> <p>void autonomous( void ) {</p> <p>}</p> <p>void usercontrol( void ) {<br> while(true) {<br> Controller1.Axis3.changed(leftDrive);<br> }<br> }</p> <p>int main() {<br> pre_auton();</p> <p>Competiton.autonomous( autonomous );<br> Competiton.drivercontrol( usercontrol );</p> <p>while(true) {<br> vex::task::sleep(100);<br> }<br> }</p> <p>And, here is the code from robot-config.cpp</p> <p><span class="hashtag">#include</span> “vex.h”&lt;br> using namespace vex;</p> <p>// A global instance of brain used for printing to the V5 brain screen<br> brain Brain;</p> <p>//VEXcode Devices<br> controller Controller1 = controller(primary);<br> motor LeftFrontDriveMotor (PORT1, ratio18_1,false);<br> motor LeftBackDriveMotor (PORT11, ratio18_1,false);<br> motor RightFrontDriveMotor (PORT10, ratio18_1,true);<br> motor RightBackDriveMotor (PORT20, ratio18_1,true);</p> <p>/**</p> <ul> <li>Used to initialize code/tasks/devices added using tools in VEXcode Text.</li> <li> </li><li>This should be called at the start of your int main function.<br> */</li> </ul> <p>void vexcodeInit(void) {<br> // Nothing to initialize<br> }</p> ]]>
</description>
<link>https://www.vexforum.com/t/port-on-brain-some-functions-not-working/83135/8</link>
<pubDate>Sun, 19 Jul 2020 16:38:10 +0000</pubDate>
<guid isPermaLink="false">www.vexforum.com-post-655101</guid>
</item>

我需要两个 <dc:creator> 标签之间的文本,但是当我搜索

soup.find('dc:creator')

它只是返回

<dc:creator></dc:creator>

我认为这可能与<>文本周围的 's 有关,但我不确定。

如何找到<dc:creator>标签的内容BeautifulSoup

标签: pythonbeautifulsoup

解决方案


如果您没有定义 XML 名称空间,xml解析器将剥离它们。所以你可以按<creator>标签搜索:

from bs4 import BeautifulSoup

txt = '''<item>
<title>Port on brain, some functions not working</title>
<dc:creator>
<![CDATA[ @nathankmiles Nathan ]]>
</dc:creator>
<description>
<![CDATA[ <p>Sorry, I thought we had already included the code that would be needed. Here is what we have been using for testing. In the code below, the problem is on Port 1 (LeftFrontDriveMotor).</p> <p>Here is the code from main.cpp.</p> <p><span class="hashtag">#include</span> “vex.h”&lt;br> <span class="hashtag">#include</span> “robot-config.h”&lt;/p> <p>using namespace vex;</p> <p>competition Competiton;</p> <p>void leftDrive() {<br> LeftFrontDriveMotor.spin(directionType::fwd, Controller1.Axis3.value(),velocityUnits::pct);<br> LeftBackDriveMotor.spin(directionType::fwd, Controller1.Axis3.value(),velocityUnits::pct);<br> }</p> <p>void pre_auton( void ) {<br> // Initializing Robot Configuration. DO NOT REMOVE!<br> vexcodeInit();<br> }</p> <p>void autonomous( void ) {</p> <p>}</p> <p>void usercontrol( void ) {<br> while(true) {<br> Controller1.Axis3.changed(leftDrive);<br> }<br> }</p> <p>int main() {<br> pre_auton();</p> <p>Competiton.autonomous( autonomous );<br> Competiton.drivercontrol( usercontrol );</p> <p>while(true) {<br> vex::task::sleep(100);<br> }<br> }</p> <p>And, here is the code from robot-config.cpp</p> <p><span class="hashtag">#include</span> “vex.h”&lt;br> using namespace vex;</p> <p>// A global instance of brain used for printing to the V5 brain screen<br> brain Brain;</p> <p>//VEXcode Devices<br> controller Controller1 = controller(primary);<br> motor LeftFrontDriveMotor (PORT1, ratio18_1,false);<br> motor LeftBackDriveMotor (PORT11, ratio18_1,false);<br> motor RightFrontDriveMotor (PORT10, ratio18_1,true);<br> motor RightBackDriveMotor (PORT20, ratio18_1,true);</p> <p>/**</p> <ul> <li>Used to initialize code/tasks/devices added using tools in VEXcode Text.</li> <li> </li><li>This should be called at the start of your int main function.<br> */</li> </ul> <p>void vexcodeInit(void) {<br> // Nothing to initialize<br> }</p> ]]>
</description>
<link>https://www.vexforum.com/t/port-on-brain-some-functions-not-working/83135/8</link>
<pubDate>Sun, 19 Jul 2020 16:38:10 +0000</pubDate>
<guid isPermaLink="false">www.vexforum.com-post-655101</guid>
</item>'''

soup = BeautifulSoup(txt, 'xml')
print(soup.find('creator').get_text(strip=True))

印刷:

@nathankmiles Nathan

或者:您可以使用html.parserbs4.CDatatxt是问题中的 HTML 片段):

from bs4 import BeautifulSoup, CData

soup = BeautifulSoup(txt, 'html.parser')
print(soup.find('dc:creator').find_next(text=lambda x: isinstance(x, CData)).strip())

印刷:

@nathankmiles Nathan

推荐阅读