python - 如何将 xml 解析为具有兄弟元素的表?
问题描述
我有看起来像这样的xml:
xml = """
<portfolio>
<assets>600000</assets>
<assetClassDetails>
<assetClassName>Bonds</assetClassName>
<assetAmount>100000</assetAmount>
</assetClassDetails>
<assetClassDetails>
<assetClassName>Equities</assetClassName>
<assetAmount>500000</assetAmount>
</assetClassDetails>
<rateOfReturn>6.3</rateOfReturn>
</portfolio>
"""
我通过这样做将每个元素解析为一个表:
root = etree.fromstring(xml)
tag = []
text = []
parent = []
double_parent = []
for element in root.iter():
try:
element_parent = element.getparent().tag
except AttributeError:
element_parent = 'none'
try:
element_double_parent = element.getparent().getparent().tag
except AttributeError:
element_double_parent = 'none'
tag.append(element.tag)
text.append(element.text)
parent.append(element_parent)
double_parent.append(element_double_parent)
df = pd.DataFrame({'tag' : tag, 'text' : text, 'parent' : parent, 'double_parent' : double_parent})
这导致:
tag text parent double_parent
portfolio \n none none
assets 600000 portfolio none
assetClassDetails \n portfolio none
assetClassName Bonds assetClassDetails portfolio
assetAmount 100000 assetClassDetails portfolio
assetClassDetails \n portfolio none
assetClassName Equities assetClassDetails portfolio
assetAmount 500000 assetClassDetails portfolio
rateOfReturn 6.3 portfolio none
我正在努力解决如何对数据进行透视,以便将资产类别名称和金额配对并绑定到投资组合标签(及其直系子代)。如何在结果中获得配对兄弟标签?
我想要的结果如下所示:
type assets rateOfReturn assetClassName assetAmount
portfolio 600000 6.3 Bonds 100000
portfolio 600000 6.3 Equities 500000
解决方案
尝试类似:
rows = []
columns = ['assets', 'rateOfReturn', 'assetClassName', 'assetAmount']
for entry in root.xpath('//assetClassDetails'):
row = []
row.extend([entry.xpath('preceding-sibling::assets/text()')[0],
entry.xpath('following-sibling::rateOfReturn/text()')[0],
entry.xpath('./assetClassName/text()')[0],
entry.xpath('./assetAmount/text()')[0]])
rows.append(row)
pd.DataFrame(rows,columns=columns)
输出:
assets rateOfReturn assetClassName assetAmount
0 600000 6.3 Bonds 100000
1 600000 6.3 Equities 500000
另一种有趣的方法是使用另一个库来做到这一点:
import pandas_read_xml as pdx
df1 = pdx.read_xml(r'path\to\myfile.xml',['portfolio','assetClassDetails'])
df2 = pdx.read_xml(r'path\to\myfile.xml',['portfolio'])
pd.concat([df2[['assets','rateOfReturn']],df1], axis=1)
输出:
assets rateOfReturn assetClassName assetAmount
0 600000 6.3 Bonds 100000
1 600000 6.3 Equities 500000
推荐阅读
- python - Python Selenium 销售组 FB
- javascript - YouTube 嵌入不更新和抛出 net::ERR_BLOCKED_BY_CLIENT
- go - 如何正确使用for循环获取数据库数据
- tensorflow - 在 conda jupyter 实验室笔记本中使用 tensorflow 的问题
- c# - MissingMethodException Serilog.Context.LogContext.Push
- java - 如何使用 JOOQ 将 TIMESTAMP WITH TIME ZONE 插入 HSQLDB
- node.js - 我如何使用 Jest 和 puppeteer-cluster
- html - 什么是 data-gr-cs-loaded="true" 属性添加到我的代码
- html - 奇怪的 CSS 背景行为
- powershell - 从 Windows 容器中识别 Kubernetes 命名空间