首页 > 解决方案 > Parsing file object XML with lxml returns external entity error

问题描述

I'm trying to get a Gzipped XML file from an FTP server, parse the XML, and pull out data using Xpaths all without having to store the files on disk. The code I've got is:

FTP.connect(hostname)
FTP.login(user,pass)

flo = io.BytesIO()

FTP.retrbinary('RETR myfile.xml.gz',flo.write)
flo.seek(0,0)
uncompressed = gzip.decompress(flo.read())
tree = etree.parse(uncompressed,etree.XMLParser(encoding='utf-8', ns_clean=True, recover=True))

Up until the etree.parse() call everything works well, after which I get the contents of the XML file printed to screen prepended with: OSError: Error reading file 'b'<?xml version="1.0" ... and ending with failed to load external entity "b'<?xml version="1.0" encoding="UTF-8"?><merchandiser xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNam

If I write the uncompressed file to disk first and then load it back in, the parse command works. I've tried parsing with using a parser that has resolve_entities=False, but nothing changes in the output.

I've seen posts such as Error 'failed to load external entity' when using Python lxml - however they refer to trying to parse a string with etree.parse() whereas I'm dealing with a byte object

type(uncompressed)
<class 'bytes'> 

Any help is much appreciated. Thanks

标签: pythonxmlxml-parsinglxml

解决方案


推荐阅读