首页 > 解决方案 > xmltodict 模块不能从命令行工作

问题描述

我尝试了这里提到的一些命令......

https://tech.marksblogg.com/working-with-data-feeds.html

但是 xmltodict 模块似乎没有按预期工作:

wget https://dumps.wikimedia.org/enwiki/20210801/enwiki-20210801-pages-articles2.xml-p41243p151573.bz2

bunzip2 enwiki-20210801-pages-articles2.xml-p41243p151573.bz2

git clone https://github.com/martinblech/xmltodict.git

cat enwiki-20210801-pages-articles2.xml-p41243p151573 | xmltodict/xmltodict.py 2 > save.txt

有没有其他方法可以将 XML 转换为 python dict?


我已检查以下内容是否按预期工作:

# python
Python 3.9.5 (default, May 12 2021, 14:30:06)
[GCC 8.3.0] on linux
>>> import xmltodict
>>> xml = """<DECL>!! आप की सेवा में पुनः पधारे !!</DECL>"""
>>> xmltodict.parse(xml, process_namespaces=True)
OrderedDict([('DECL', '!! आप की सेवा में पुनः पधारे !!')])

但它不适用于上述文件,可能是因为文件太大。


我尝试了上述教程中提到的类似命令。

# cat enwiki-20210801-pages-articles2.xml-p41243p151573 | xmltodict/xmltodict.py 2 | python /tmp/dump_pages.py
Traceback (most recent call last):
  File "/tmp/dump_pages.py", line 7, in <module>
    _, page = marshal.load(sys.stdin)
TypeError: file.read() returned not bytes but str
Traceback (most recent call last):
  File "/tmp/stack/xmltodict/xmltodict.py", line 533, in <module>
    root = parse(stdin,
  File "/tmp/stack/xmltodict/xmltodict.py", line 368, in parse
    parser.ParseFile(xml_input)
  File "/usr/src/python/Modules/pyexpat.c", line 461, in EndElement
  File "/tmp/stack/xmltodict/xmltodict.py", line 132, in endElement
    should_continue = self.item_callback(self.path, item)
  File "/tmp/stack/xmltodict/xmltodict.py", line 529, in handle_item
    marshal.dump((path, item), stdout)
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe

转储文件内容:

# cat /tmp/dump_pages.py
import json
import marshal
import sys

while True:
    try:
        _, page = marshal.load(sys.stdin)
        print (json.dumps(page))
    except EOFError:
        break

我只是想将维基百科的 XML 转储转换为 CSV。(仅限某些列)

标签: pythonxmlxmltodict

解决方案


推荐阅读