首页 > 解决方案 > search in the values of my XML with Python

问题描述

I have XML like this and I would like to obtain the value of the line with tag=035 and code=a for the node where tag=035 and code=9 is "BAI" I have tried to identify the node where BAI appears with this and then ask for its parent node

[ _sub.getparent() for _sub in _xml.findall(".//*[@tag='035']/*[@code='9']") if(_sub.text=='BAI') ]

but the parent is empty ... how do I get my 035,a at the node where 035,9='BAI'?

标签: pythonxmllxml

解决方案


您可以像这样在纯 XPath 中完成所有操作:

//*[@tag='035']/*[@code='9'][. = 'BAI']/following-sibling::*[@code='a']

该公式假定任何验证和/或发布您的数据的内容都将强制执行任何[@code='a']s如下 [@code='9']s。

您也可以,也许理想情况下,像这样编写 xpath:

//*[@tag='035']/*[@code='9'][. = 'BAI']/../*[@code='a']

或者像这样:

//*[@tag='035'][subfield[@code='9' and . = 'BAI']]/subfield[@code='a']

或更一般地说:

//*[@tag='035'][child::*[@code='9' and . = 'BAI']]/child::*[@code='a']

该提法在顺序方面没有任何假设。

XPath 是一种非常强大的语言,特别是 XPath 3.0 是一种完全图灵完备的语言,这使它更加强大和令人敬畏。

就 lxml 而言,它不会采用所有这些公式。但幸运的是,最短和最甜蜜的被接受了,所以:

from lxml import etree


tree = etree.parse("data/search.xml")

print(tree.findall("//*[@tag='035']/*[@code='9'][. = 'BAI']/../*[@code='a']"))

希望这可以帮助!


推荐阅读