python - Parsing file object XML with lxml returns external entity error
问题描述
I'm trying to get a Gzipped XML file from an FTP server, parse the XML, and pull out data using Xpaths all without having to store the files on disk. The code I've got is:
FTP.connect(hostname)
FTP.login(user,pass)
flo = io.BytesIO()
FTP.retrbinary('RETR myfile.xml.gz',flo.write)
flo.seek(0,0)
uncompressed = gzip.decompress(flo.read())
tree = etree.parse(uncompressed,etree.XMLParser(encoding='utf-8', ns_clean=True, recover=True))
Up until the etree.parse() call everything works well, after which I get the contents of the XML file printed to screen prepended with:
OSError: Error reading file 'b'<?xml version="1.0" ...
and ending with
failed to load external entity "b'<?xml version="1.0" encoding="UTF-8"?><merchandiser xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNam
If I write the uncompressed file to disk first and then load it back in, the parse command works. I've tried parsing with using a parser that has resolve_entities=False, but nothing changes in the output.
I've seen posts such as Error 'failed to load external entity' when using Python lxml - however they refer to trying to parse a string with etree.parse() whereas I'm dealing with a byte object
type(uncompressed)
<class 'bytes'>
Any help is much appreciated. Thanks
解决方案
推荐阅读
- javascript - 在图片加载之前预加载 JS 计算
- node.js - 使用 node.js 和 Express 时,为什么额外的 / 在地址中会中断链接到图像而不是 html 文件?
- sql - 在 dbgrid 中的子选择 SQL 查询列上写入值
- erlang - 如何更改/设置 :mnesia 长生不老药中的文件夹?
- angular - 如何为迁移的域设置角度构建路径?
- javascript - 使用 AM 排序日期和时间 | jQuery中的PM
- amazon-web-services - 如何在 AWS 中为两个 CloudFront 设置 Canary Release?
- c# - 查找下一个日历年第 13 天的星期五
- julia - Julia 1.0 中的逗号任何类型的数字
- c# - Gmaps 异常:无法解析远程名称