python - 如何使用python搜索xml文件中的特定标签
问题描述
我有一个非常大且复杂的 xml 文件,我想从中获取text_body
。我需要跳过其他树和树枝,只得到它们看起来像这样的特定部分:
<req id="1">
<text_body>
Upon the USB being plugged in the system shall be able to be deployed and operational in less than 1 minute.
</text_body>
</req>
<req id="2">
<text_body>
The system shall be able to handle 1000 customers logged in concurrently at the same time.
</text_body>
</req>
<req id="CO-1">
<text_body>
Must use a SQL based database. SQL standard is the most widely used database format. Restricting to SQL allows easy of use and compatibility for Web Store.
</text_body>
</req>
<req id="CO-2">
<text_body>
Compatibility is only tested and verified for Microsoft Internet Explorer version 6 and 7, Netscape Communicator Version 4 and 5. Other versions may not be 100% compatible. Also other browsers such as Mozilla or Firefox may not be 100% compatible.
</text_body>
</req>
<req id="3">
<text_body>
The system shall adhere to the following hardware requirements:
<itemize>
<item>4GB Flash ram chip</item>
<item>128MB SDRAM</item>
<item>Intel XScale PXA270 520-MHz chipset</item>
<item>OS: Apache web server</item>
<item>Database: MySQL</item>
</itemize>
</text_body>
</req>
我需要输入字符串,text_body
但我怎样才能编写我的代码,如“返回任何 id 的字符串”。如您所见,有不同的ID。最后一个里面还有一个text_body
我不需要的itemsize。有类似的问题,例如Q1和Q2我试图从 therm 获得帮助,但他们没有返回我需要的东西。我怎样才能做到这一点?
更新我需要这样的输出:
要求 1:第一个 text_body
要求 2:seconf text_body
解决方案
这是你要找的吗?
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('test.xml').read(), features='lxml')
for text_body in soup.find_all('text_body')[:2]:
print(text_body.get_text().strip())
输出
Upon the USB being plugged in the system shall be able to be deployed and operational in less than 1 minute.
The system shall be able to handle 1000 customers logged in concurrently at the same time.
推荐阅读
- excel - 从sharepoint提取SSIS包中的excel表
- windows - PowerShell:排序对象降序错误
- r - 如何在ggplot中绘制带有NA的混合频率序列?
- python - Numpy 切片与列表切片
- swift - 按返回键后导航栏不隐藏
- c# - 将数据从一种形式传递到另一种形式(datagridview)
- spring-boot - 为 neo4j 中的所有集合附加默认前缀
- python - 无法使用 beautifulsoup 在网站上刮桌子
- java - 使用 BouncyCastle/Java 对 InputStream 进行 PGP 加密
- numpy - pyqgis:从列表中创建多层(自动)