python - 如何设置 pandas 的 read_xml 的 `xpath`?
问题描述
我想从其部分的xml 文件Component
中解析数据:
<Component>
<UnderlyingSecurityID>300001</UnderlyingSecurityID>
<UnderlyingSecurityIDSource>102</UnderlyingSecurityIDSource>
<UnderlyingSymbol>特锐德</UnderlyingSymbol>
<ComponentShare>300.00</ComponentShare>
<SubstituteFlag>1</SubstituteFlag>
<PremiumRatio>0.25000</PremiumRatio>
<CreationCashSubstitute>0.0000</CreationCashSubstitute>
<RedemptionCashSubstitute>0.0000</RedemptionCashSubstitute>
</Component>
<Component>
<UnderlyingSecurityID>300003</UnderlyingSecurityID>
<UnderlyingSecurityIDSource>102</UnderlyingSecurityIDSource>
<UnderlyingSymbol>乐普医疗</UnderlyingSymbol>
<ComponentShare>600.00</ComponentShare>
<SubstituteFlag>1</SubstituteFlag>
<PremiumRatio>0.25000</PremiumRatio>
<CreationCashSubstitute>0.0000</CreationCashSubstitute>
<RedemptionCashSubstitute>0.0000</RedemptionCashSubstitute>
</Component>
我已经安装了最新版本的 lxml 和 pandas,尝试了以下代码但没有运气。
Python 3.9.4 (tags/v3.9.4:1f2e308, Apr 6 2021, 13:40:21) [MSC v.1928 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.25.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas as pd
In [2]: pd.__version__
Out[2]: '1.3.0'
In [3]: xml = pd.read_xml('https://www.huaan.com.cn/etf/159949/etffiledownload.jsp?etffilename=pcf_159949_20210707.xml', xpath='//component')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-67d228028cc9> in <module>
----> 1 xml = pd.read_xml('https://www.huaan.com.cn/etf/159949/etffiledownload.jsp?etffilename=pcf_159949_20210707.xml', xpath='//component')
...
501 if elems == []:
--> 502 raise ValueError(msg)
503
504 if elems != [] and attrs == [] and children == []:
ValueError: xpath does not return any nodes. Be sure row level nodes are in xpath. If document uses namespaces denoted with xmlns, be sure to define namespaces and use them in xpath.
In [4]: xml = pd.read_xml('https://www.huaan.com.cn/etf/159949/etffiledownload.jsp?etffilename=pcf_159949_20210707.xml', xpath='//component', namespaces={'com': 'http://ts.szse.cn/Fund'})
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-52fbe542dadb> in <module>
----> 1 xml = pd.read_xml('https://www.huaan.com.cn/etf/159949/etffiledownload.jsp?etffilename=pcf_159949_20210707.xml', xpath='//component', namespaces={'com': 'http://ts.szse.cn/Fund'})
...
501 if elems == []:
--> 502 raise ValueError(msg)
503
504 if elems != [] and attrs == [] and children == []:
ValueError: xpath does not return any nodes. Be sure row level nodes are in xpath. If document uses namespaces denoted with xmlns, be sure to define namespaces and use them in xpath.
我也直接尝试过lxml
,似乎可行:
In [5]: from lxml import etree
In [6]: import requests
In [7]: content = requests.get('https://www.huaan.com.cn/etf/159949/etffiledownload.jsp?etffilename=pcf_159949_20210707.xml').content
In [8]: html = etree.HTML(content)
In [9]: html.xpath('//component')
Out[9]:
[<Element component at 0x1d493cb23c0>,
<Element component at 0x1d493cb2340>,
<Element component at 0x1d493cb2240>,
<Element component at 0x1d493cb22c0>,
<Element component at 0x1d493cb2140>,
<Element component at 0x1d493cb2040>,
<Element component at 0x1d493cb2c40>,
<Element component at 0x1d493cb61c0>,
<Element component at 0x1d493cb63c0>,
<Element component at 0x1d493cb2200>,
...
我不知道为什么read_xml
不起作用。任何帮助,将不胜感激!
解决方案
推荐阅读
- java - 如果在 java 中使用 MAP,如何发送 JSON @RequestBody?
- azure - 从 Azure DevOps 获取组织/实例名称列表
- android - 如何为具有各种片段的 Kotlin Android 应用程序创建带有协程的单例计时器?
- shopify - 如何在 shopify 中创建自定义模块(CRUD)?喜欢,产品和导航
- sql - 多行的 SQL 查询
- node.js - npm install bcrypt@3.0.6 使用节点 6.12.1 失败
- sql-server - 使用一个表中的 id 连接 4 个表并为 2 个表求和数量
- php - 如何通过 jQuery AJAX 将 php 变量拉入 js 对象的属性中?
- kubernetes - 如何获取部署在 Kubernetes 中的应用程序的延迟?
- algorithm - 如何在有效的时间内解决 Kakurasu Puzzle?