首页 > 解决方案 > 使用 unix/python 将 XML 数据嵌套和重复到文件或配置单元表

问题描述

我正在寻求帮助以使用 unix 或 python 脚本将更复杂的 xml 转换为文件或配置单元表。

xml data is 
<FOLD name="ENTERTAINMENT" title="1">
<FOLD name="MOVIE1" title="2">
<EVNT_UNIX id="1234"  name="USA"/>
</FOLD>
</FOLD>
<FOLD name="MOVIE2" title="2">
<EVNT_UNIX id="12345"  name="INDIA"/>
<WORKS id='W123' name ='KRISH'/>
<WORKS id='W456' name ='Jhon'/>
<WORKS id='W789' name ='Nancy'/>
</FOLD>
<FOLD name="MOVIE3" title="4">
<EVNT_UNIX id="1234" name="INDIA"/>
<WORKS id='W123' name ='KRISH'/>
<FOLD name="MOVIE3-1" title="4">
<WORKS id='W456' name ='Jhon'/>
<LOCATION space='available' name ='TEST'/>
</FOLD>
</FOLD>

..

到目前为止,我已经在 phyton 中尝试了下面的示例代码,并且只获取特定的标签详细信息,而不管内部标签的详细信息。

doc = minidom.parse("test.xml")

foldlist = doc.getElementsByTagName("fold")
for fold in foldlist:
      fname = fold.getAttribute("name")
      title = fold.getElementsByTagName("title")
      print("fname:%s, title:%s" %
          (fname, title))

预期输出是

ENTERTAINMENT,MOVIE1,1234,USA
ENTERTAINMENT,MOVIE2,12345,INDIA
ENTERTAINMENT,MOVIE2,w123,KRISH
ENTERTAINMENT,MOVIE2,w456,Jhon
ENTERTAINMENT,MOVIE2,W789,Nancy
ENTERTAINMENT,MOVIE3,1234,INDIA
ENTERTAINMENT,MOVIE3,W123,KRISH
ENTERTAINMENT,MOVIE3-1,4
ENTERTAINMENT,MOVIE3-1,W456,Jhon
ENTERTAINMENT,MOVIE3-1,available,TEST

标签: pythonpython-3.xunixpyspark

解决方案


推荐阅读