首页 > 解决方案 > 在 Hive 中,如何使用 explode(XPATH(..)) 函数读取 XML 中存在的 NULL/空标签?

问题描述

在下面的 Hive 查询中,我还需要从 XML 内容中读取空/空“字符串”标签。现在列表中只考虑非空“字符串”标签XPATH()

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
            <string>222</string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>EFGH</Name>
        <Value>
            <string/>
            <string>444</string>
            <string></string>
            <string>555</string>

        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)

select Name, Value 
  from your_data d
       lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  Name
       lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as Value;

查询的预期输出:

Name    Value
ABCD    111
ABCD    
ABCD    222
EFGH    
EFGH    444
EFGH    
EFGH    555

标签: sqlxmlxpathhivehiveql

解决方案


这里的问题是XPATH返回 NodeList 并且如果它包含空节点,则它不包含在列表中。

与一些字符串连接(在 XPATH 中):concat(/Value/string/text()," ")在这里不起作用:

原因:javax.xml.xpath.XPathExpressionException:com.sun.org.apache.xpath.internal.XPathException:无法将#STRING 转换为NodeList!

在 com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:195)

简单的解决方案是替换<string></string>and <string/><string>NULL</string>然后您可以将 'NULL' 字符串转换为 null。

演示:

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
            <string>222</string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>EFGH</Name>
        <Value>
            <string/>
            <string>444</string>
            <string></string>
            <string>555</string>
        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)

select name, case when value='NULL' then null else value end value
  from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo 
          from your_data d
       ) d
       lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  Name
       lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as value

结果:

name    value
ABCD    111
ABCD    
ABCD    222
EFGH    
EFGH    444
EFGH    
EFGH    555

推荐阅读