首页 > 解决方案 > 遍历 XML 并选择特定的元素树内容

问题描述

我有一个如下所示的 XML:

<openie>
  <triple confidence="1.000">
    <subject begin="0" end="1">
      <text>PAF</text>
      <lemma>paf</lemma>
    </subject>
    <relation begin="1" end="2">
      <text>gets</text>
      <lemma>get</lemma>
    </relation>
    <object begin="2" end="6">
      <text>name of web site</text>
      <lemma>name of web site</lemma>
    </object>
  </triple>
  <triple confidence="1.000">
    <subject begin="0" end="1">
      <text>PAF</text>
      <lemma>paf</lemma>
    </subject>
    <relation begin="1" end="2">
      <text>gets</text>
      <lemma>get</lemma>
    </relation>
    <object begin="2" end="3">
      <text>name</text>
      <lemma>name</lemma>
    </object>
  </triple>
</openie>

元素openie嵌套在这里:root>document>sentences>sentence>openie

在我的函数中,我试图打印triples每个包含subject, relation, object元素的内容。不幸的是,我无法让它发挥作用,因为我无法深入了解这三个元素及其text元素。哪一部分是错的?

def get_openie():
    print('OpenIE parser start...')
    tree = ET.parse('./tmp/nlp_output.xml')
    root = tree.getroot()
    for triple in root.findall('./document/sentences/sentence/openie/triple'):
        t_subject = triple.find('subject/text').text
        t_relation = triple.find('relation/text').text
        t_object = triple.get('object/text').text
        print(t_subject,t_relation,t_object)

两个三元组的输出应如下所示:

PAF gets name of web site

PAF gets name

标签: pythonxmlxml-parsing

解决方案


为了让你t_object运行triple.get()而不是triple.find(). 改变它可以解决您的问题。

def get_openie():
    print('OpenIE parser start...')
    tree = ET.parse('./tmp/nlp_output.xml')
    root = tree.getroot()
    for triple in root.findall('./document/sentences/sentence/openie/triple'):
        t_subject = triple.find('subject/text').text
        t_relation = triple.find('relation/text').text
        t_object = triple.find('object/text').text
        print(t_subject,t_relation,t_object)

推荐阅读