首页 > 解决方案 > 为什么在我打印 nodeName 时会弹出无关的文本?

问题描述

假设,我有以下 XML 文件:

<?xml version="1.0" encoding="utf-8"?>
<library attrib1="att11" attrib2="att22">
    library-text
    <book isbn="1111111111">
        <title lang="en">T1 T1 T1 T1 T1</title>
        <date>2001</date>
        <author>A1 A1 A1 A1 A1</author>     
        <price>10.00</price>
    </book>
    <book isbn="2222222222">
        <title lang="en">T2 T2 T2 T2 T2</title>
        <date>2002</date>
        <author>A2 A2 A2 A2 A2</author>     
        <price>20.00</price>
    </book>
    <book isbn="3333333333">
        <title lang="en">T3 T3 T3 T3</title>
        <date>2003</date>
        <author>A3 A3 A3 A3 A3y</author>        
        <price>30.00</price>
    </book>
</library>

主文件

import xml.dom.minidom as minidom

xml_fname = "library.xml"

dom = minidom.parse(xml_fname) 

for node in dom.firstChild.childNodes:
    print(node.nodeName)

输出

#text
book
#text
book
#text
book
#text

为什么输出显示#text?它来自哪里?

标签: python-3.xxmlminidom

解决方案


如果您更改print(node.nodeName)为,print(node)您将看到输出

<DOM Text node "'\n    libra'...">
<DOM Element: book at 0x11f48ec8>
<DOM Text node "'\n    '">
<DOM Element: book at 0x11f50070>
<DOM Text node "'\n    '">
<DOM Element: book at 0x11f501d8>
<DOM Text node "'\n'">

minidom将“自由文本”“节点”视为实际的、无名的 DOM 文本节点,名称为#text.

如果您只想要book节点,请明确说明:

for node in dom.getElementsByTagName('book'):
    print(node.nodeName)

输出

book
book
book

请记住,minidom不鼓励使用 。来自官方 Python 文档

尚未精通 DOM 的用户应考虑改用该xml.etree.ElementTree模块进行 XML 处理。

考虑使用ElementTree

import xml.etree.ElementTree as ET

xml_fname = "library.xml"

root = ET.parse(xml_fname)

for node in root.findall('book'):
    print(node.tag)

也输出

book
book
book

推荐阅读