,python,python-2.7,xml-parsing,lxml"/>

首页 > 解决方案 > 如何修复 XMLSyntaxError:标签中的数据过早结束

问题描述

所以我已经看到了一些答案,但似乎没有一个可以解决这个问题。我正在尝试使用 lxml.etree.parse 方法解析一个简单的文件,但是我不断收到错误消息

lxml.etree.XMLSyntaxError:标记正文第 2 行第 2 行第 32 列中的数据过早结束

我得到了在线错误:

tree = etree.parse( infile, parser )

这是格式正确的简单 xml 文件:

<?xml version="1.0" encoding="UTF-8"?>
<note>
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Dont forget me this weekend</body>
</note>

这是我的实现,一个解析xml文件的主函数(parseXmlFile)

from lxml import etree
from path import path
from FixNS import FixNS
....

def parseXmlFile(xmlFilePath):
    nsPrefixMap = dict()
    fixns = FixNS()
    fixns.fixNS( xmlFilePath )
    infile = fixns.getResult()
    fixns.getNSPrefixMap( nsPrefixMap )

    parser = etree.XMLParser( remove_blank_text = True, ns_clean = True, huge_tree = True )
    tree = etree.parse( infile, parser )
    root = tree.getroot()
    return tree, root, nsPrefixMap, fixns

FixNS 类和辅助函数 - 为完整性而添加:

from cStringIO import StringIO
import xml.sax.expatreader
from xml.sax import make_parser, SAXNotRecognizedException, SAXNotSupportedException
from xml.sax.handler import property_lexical_handler, feature_namespaces, feature_validation
from xml.sax.saxutils import XMLGenerator, quoteattr
from blzip import ReadBLZip

class FixNS(XMLGenerator):

    def __init__(self):
        XMLGenerator.__init__(self)

    def fixNS(self, infilename):
        XMLGenerator.__init__(self, StringIO())
        self._out = StringIO()
        self._result = StringIO()
        self._inFileName = infilename
        self._nsDeclPos = 0
        self._wasBLZipped = False
        self._inContent = file(self._inFileName, 'rb').read()
        if self._inContent.startswith('BLZIP'):
            self._inContent = ReadBLZip(self._inFileName)
            self._wasBLZipped = True
        self._knownNsPrefixes = set()
        self._collectedNsPrefixes = dict()
        self._isroot = True
        self._in_entity = 0
        self._in_cdata = 0
        self._line = 0
        self._column = 0
        self._parser = make_parser(['xml.sax.expatreader'])
        self._parser.setContentHandler(self)
        self._parser.setProperty(property_lexical_handler, self)
        try:
            self._parser.setFeature(feature_namespaces, 0)
        except (SAXNotRecognizedException, SAXNotSupportedException):
            pass

        try:
            self._parser.setFeature(feature_validation, 0)
        except (SAXNotRecognizedException, SAXNotSupportedException):
            pass

        self._parser.parse(StringIO(self._inContent))

    def getResult(self):
        return StringIO(self._result.getvalue())

我正在使用python 2.7lxml-2.3。有关如何解决此解析错误的任何帮助?

标签: pythonpython-2.7xml-parsinglxml

解决方案


推荐阅读