首页 > 解决方案 > 如何使用python搜索xml文件中的特定标签

问题描述

我有一个非常大且复杂的 xml 文件,我想从中获取text_body。我需要跳过其他树和树枝,只得到它们看起来像这样的特定部分:

<req id="1">
    <text_body>
        Upon the USB being plugged in the system shall be able to be deployed and operational in less than 1 minute.
    </text_body>
</req>
<req id="2">
    <text_body>
    The system shall be able to handle 1000 customers logged in concurrently at the same time.
    </text_body>
</req>
<req id="CO-1">
    <text_body>
        Must use a SQL based database. SQL standard is the most widely used database format. Restricting to SQL allows easy of use and compatibility for Web Store.
    </text_body>
</req>
<req id="CO-2">
    <text_body>
        Compatibility is only tested and verified for Microsoft Internet Explorer version 6 and 7, Netscape Communicator Version 4 and 5. Other versions may not be 100&#37; compatible. Also other browsers such as Mozilla or Firefox may not be 100&#37; compatible.
    </text_body>
</req>
<req id="3">
    <text_body>
The system shall adhere to the following hardware requirements:
    <itemize>
        <item>4GB Flash ram chip</item>
        <item>128MB SDRAM</item>
        <item>Intel XScale PXA270 520-MHz chipset</item>
        <item>OS: Apache web server</item>
        <item>Database: MySQL</item>
    </itemize>
    </text_body>
</req>

我需要输入字符串,text_body但我怎样才能编写我的代码,如“返回任何 id 的字符串”。如您所见,有不同的ID。最后一个里面还有一个text_body我不需要的itemsize。有类似的问题,例如Q1Q2我试图从 therm 获得帮助,但他们没有返回我需要的东西。我怎样才能做到这一点?

更新我需要这样的输出:
要求 1:第一个 text_body
要求 2:seconf text_body

标签: pythonxmlparsing

解决方案


这是你要找的吗?

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('test.xml').read(), features='lxml')
for text_body in soup.find_all('text_body')[:2]:
    print(text_body.get_text().strip())

输出

Upon the USB being plugged in the system shall be able to be deployed and operational in less than 1 minute.
The system shall be able to handle 1000 customers logged in concurrently at the same time.

推荐阅读