python - 在 BeautifulSoup 中寻找标签的内容,但它返回空白
问题描述
我试图解析的 xml 如下所示:
<item>
<title>Port on brain, some functions not working</title>
<dc:creator>
<![CDATA[ @nathankmiles Nathan ]]>
</dc:creator>
<description>
<![CDATA[ <p>Sorry, I thought we had already included the code that would be needed. Here is what we have been using for testing. In the code below, the problem is on Port 1 (LeftFrontDriveMotor).</p> <p>Here is the code from main.cpp.</p> <p><span class="hashtag">#include</span> “vex.h”<br> <span class="hashtag">#include</span> “robot-config.h”</p> <p>using namespace vex;</p> <p>competition Competiton;</p> <p>void leftDrive() {<br> LeftFrontDriveMotor.spin(directionType::fwd, Controller1.Axis3.value(),velocityUnits::pct);<br> LeftBackDriveMotor.spin(directionType::fwd, Controller1.Axis3.value(),velocityUnits::pct);<br> }</p> <p>void pre_auton( void ) {<br> // Initializing Robot Configuration. DO NOT REMOVE!<br> vexcodeInit();<br> }</p> <p>void autonomous( void ) {</p> <p>}</p> <p>void usercontrol( void ) {<br> while(true) {<br> Controller1.Axis3.changed(leftDrive);<br> }<br> }</p> <p>int main() {<br> pre_auton();</p> <p>Competiton.autonomous( autonomous );<br> Competiton.drivercontrol( usercontrol );</p> <p>while(true) {<br> vex::task::sleep(100);<br> }<br> }</p> <p>And, here is the code from robot-config.cpp</p> <p><span class="hashtag">#include</span> “vex.h”<br> using namespace vex;</p> <p>// A global instance of brain used for printing to the V5 brain screen<br> brain Brain;</p> <p>//VEXcode Devices<br> controller Controller1 = controller(primary);<br> motor LeftFrontDriveMotor (PORT1, ratio18_1,false);<br> motor LeftBackDriveMotor (PORT11, ratio18_1,false);<br> motor RightFrontDriveMotor (PORT10, ratio18_1,true);<br> motor RightBackDriveMotor (PORT20, ratio18_1,true);</p> <p>/**</p> <ul> <li>Used to initialize code/tasks/devices added using tools in VEXcode Text.</li> <li> </li><li>This should be called at the start of your int main function.<br> */</li> </ul> <p>void vexcodeInit(void) {<br> // Nothing to initialize<br> }</p> ]]>
</description>
<link>https://www.vexforum.com/t/port-on-brain-some-functions-not-working/83135/8</link>
<pubDate>Sun, 19 Jul 2020 16:38:10 +0000</pubDate>
<guid isPermaLink="false">www.vexforum.com-post-655101</guid>
</item>
我需要两个 <dc:creator> 标签之间的文本,但是当我搜索
soup.find('dc:creator')
它只是返回
<dc:creator></dc:creator>
我认为这可能与<>
文本周围的 's 有关,但我不确定。
如何找到<dc:creator>
标签的内容BeautifulSoup
?
解决方案
如果您没有定义 XML 名称空间,xml
解析器将剥离它们。所以你可以按<creator>
标签搜索:
from bs4 import BeautifulSoup
txt = '''<item>
<title>Port on brain, some functions not working</title>
<dc:creator>
<![CDATA[ @nathankmiles Nathan ]]>
</dc:creator>
<description>
<![CDATA[ <p>Sorry, I thought we had already included the code that would be needed. Here is what we have been using for testing. In the code below, the problem is on Port 1 (LeftFrontDriveMotor).</p> <p>Here is the code from main.cpp.</p> <p><span class="hashtag">#include</span> “vex.h”<br> <span class="hashtag">#include</span> “robot-config.h”</p> <p>using namespace vex;</p> <p>competition Competiton;</p> <p>void leftDrive() {<br> LeftFrontDriveMotor.spin(directionType::fwd, Controller1.Axis3.value(),velocityUnits::pct);<br> LeftBackDriveMotor.spin(directionType::fwd, Controller1.Axis3.value(),velocityUnits::pct);<br> }</p> <p>void pre_auton( void ) {<br> // Initializing Robot Configuration. DO NOT REMOVE!<br> vexcodeInit();<br> }</p> <p>void autonomous( void ) {</p> <p>}</p> <p>void usercontrol( void ) {<br> while(true) {<br> Controller1.Axis3.changed(leftDrive);<br> }<br> }</p> <p>int main() {<br> pre_auton();</p> <p>Competiton.autonomous( autonomous );<br> Competiton.drivercontrol( usercontrol );</p> <p>while(true) {<br> vex::task::sleep(100);<br> }<br> }</p> <p>And, here is the code from robot-config.cpp</p> <p><span class="hashtag">#include</span> “vex.h”<br> using namespace vex;</p> <p>// A global instance of brain used for printing to the V5 brain screen<br> brain Brain;</p> <p>//VEXcode Devices<br> controller Controller1 = controller(primary);<br> motor LeftFrontDriveMotor (PORT1, ratio18_1,false);<br> motor LeftBackDriveMotor (PORT11, ratio18_1,false);<br> motor RightFrontDriveMotor (PORT10, ratio18_1,true);<br> motor RightBackDriveMotor (PORT20, ratio18_1,true);</p> <p>/**</p> <ul> <li>Used to initialize code/tasks/devices added using tools in VEXcode Text.</li> <li> </li><li>This should be called at the start of your int main function.<br> */</li> </ul> <p>void vexcodeInit(void) {<br> // Nothing to initialize<br> }</p> ]]>
</description>
<link>https://www.vexforum.com/t/port-on-brain-some-functions-not-working/83135/8</link>
<pubDate>Sun, 19 Jul 2020 16:38:10 +0000</pubDate>
<guid isPermaLink="false">www.vexforum.com-post-655101</guid>
</item>'''
soup = BeautifulSoup(txt, 'xml')
print(soup.find('creator').get_text(strip=True))
印刷:
@nathankmiles Nathan
或者:您可以使用html.parser
和bs4.CData
(txt
是问题中的 HTML 片段):
from bs4 import BeautifulSoup, CData
soup = BeautifulSoup(txt, 'html.parser')
print(soup.find('dc:creator').find_next(text=lambda x: isinstance(x, CData)).strip())
印刷:
@nathankmiles Nathan
推荐阅读
- python - 拆包山魈响应以进行监控
- bootstrap-4 - React-Bootstrap 菜单不显示当前选择
- python - 如何在 NetworkX 中的其余节点上为前 20 个节点着色
- rust - 如何修复`无法将jobserver管道容量从4096增加到8192;jobserver 否则可能会出现来自 cargo/rustc 的死锁错误
- c# - 如何在 Linq C# 中使用嵌套字典对列表进行排序?
- google-apps-script - Apps 脚本需要检查权限并请求 onOpen
- sql - SQL 查询性能 - UI 响应性问题
- unix - 在特定行中输入编辑时打开 FZF,其中包含目录中的文件行
- javascript - 编程我的不和谐机器人时遇到问题
- nativescript - Nativescript 7.x iOS 静态库