首页 > 解决方案 > 从大型 XML 中提取标签到 csv

问题描述

我需要用 Excel 打开一个大型 XML 数据集(该文件是此处连接的 LEI 文件 。观察结果不超过 Excel 工作簿限制。我尝试从 XML 中提取四列到 csv。我的代码是:

cols = ["StartNodeID", "EndNodeID", "RelationshipType", "RelationshipDate_EndDate"]
rows = [] 

xmlparse = Xet.parse('D:\Descargas\concatenated_lei2file_StartNodeID.xml')
root = xmlparse.getroot()

Relationships = tree.findall('./RelationshipRecord/Relationship')

for elem in Relationships:
   StartNodeID = elem.find("./RelationshipRecord/Relationship/EndNode/EndNodeID").text
   EndNodeID = elem.find("EndNodeID").text
   RelationshipType = elem.find("RelationshipType").text
   RelationshipDate_EndDate = elem.find("RelationshipDate_EndDate").text

rows.append({"StartNodeID": StartNodeID,
             "EndNodeID": EndNodeID,
             "RelationshipType": RelationshipType,
             "RelationshipDate_EndDate": RelationshipDate_EndDate})

df = pd.DataFrame(rows, columns=cols)

df.to_csv('D:\Descargas\concatenated_lei2file_output.csv')

我收到此错误:

----> 4     StartNodeID = elem.find("./RelationshipRecord/Relationship/EndNode/EndNodeID").text
  5     EndNodeID = elem.find("EndNodeID").text
  6     RelationshipType = elem.find("RelationshipType").text
AttributeError: 'NoneType' object has no attribute 'text'

这是一个相当大的集合。我能够使用 firstobjectXML 编辑器查看树结构。

树状结构

我努力了:

但不断收到相同的错误通知。

标签: pythonxmlcsvdata-sciencelarge-files

解决方案


推荐阅读