首页 > 解决方案 > Perl XML to Hash 删除最后一个 xml 节点并强制一个数组

问题描述

我有一个 XML,需要将其转换为特定格式的哈希,该格式需要一些节点位于数组中。我试过 XML::Simple 但无法摆脱一个 xml 节点级别。

#!/usr/bin/perl
use Data::Dumper::Simple;
use XML::Simple;

use warnings;
use strict;

my $xml = <<'XML';
<?xml version="1.0"?>
<release id="9999" status="Accepted">
  <images>
    <image height="511" type="primary" uri="" uri150="" width="600"/>
    <image height="519" type="secondary" uri="" uri150="" width="600"/>
    <image height="521" type="secondary" uri="" uri150="" width="600"/>
    <image height="217" type="secondary" uri="" uri150="" width="500"/>
    <image height="597" type="secondary" uri="" uri150="" width="600"/>
    <image height="89" type="secondary" uri="" uri150="" width="600"/>
  </images>
  <artists>
    <artist>
      <id>45</id>
      <name>Aphex Twin</name>
      <anv/>
      <join/>
      <role/>
      <tracks/>
    </artist>
  </artists>
</release>
XML

my $xml_hash = XMLin($xml, ForceArray => qr{image}x );
print Dumper $xml_hash; 

期望的输出

       'images' => [
                     {
                       'type' => 'primary',
                       'width' => 600,
                       'resource_url' => '',
                       'uri150' => '',
                       'height' => 511,
                       'uri' => ''
                     },
                     {
                       'width' => 600,
                       'type' => 'secondary',
                       'resource_url' => '',
                       'uri150' => '',
                       'uri' => '',
                       'height' => 519
                     }, etc...

我的示例代码得到的是

$xml_hash = {
              'images' => [
                            {
                              'image' => [
                                           {
                                             'uri150' => '',
                                             'type' => 'primary',
                                             'uri' => '',
                                             'height' => '511',
                                             'width' => '600'
                                           },
                                           {
                                             'type' => 'secondary',
                                             'uri150' => '',
                                             'uri' => '',
                                             'height' => '519',
                                             'width' => '600'
                                           },
                                           {
                                             'uri' => '',
                                             'height' => '521',
                                             'width' => '600',
                                             'type' => 'secondary',
                                             'uri150' => ''
                                           },
                              etc...

我该如何摆脱

'图像' => [

并且有

'图像' => [

包含所有哈希?

谢谢; 乔治

标签: arraysxmlperlhash

解决方案


任何将整个 XML 文档表示为 Perl 数据结构的尝试都会因这两种格式的性质而充满边缘情况和不方便的设计。有许多选项可以以适合格式的方式解析和遍历 XML,例如XML::LibXMLXML::Twig。以下是我使用Mojo::DOM(使用 CSS 选择器进行遍历)的方法:

use strict;
use warnings;
use Mojo::DOM;
use Mojo::Util 'dumper';

my $xml = <<'XML';
<?xml version="1.0"?>
<release id="9999" status="Accepted">
  <images>
    <image height="511" type="primary" uri="" uri150="" width="600"/>
    <image height="519" type="secondary" uri="" uri150="" width="600"/>
    <image height="521" type="secondary" uri="" uri150="" width="600"/>
    <image height="217" type="secondary" uri="" uri150="" width="500"/>
    <image height="597" type="secondary" uri="" uri150="" width="600"/>
    <image height="89" type="secondary" uri="" uri150="" width="600"/>
  </images>
  <artists>
    <artist>
      <id>45</id>
      <name>Aphex Twin</name>
      <anv/>
      <join/>
      <role/>
      <tracks/>
    </artist>
  </artists>
</release>
XML

my $dom = Mojo::DOM->new->xml(1)->parse($xml);
my @images = $dom->find('release#9999 > images > image')->map('attr')->each;
print dumper \@images;

输出:

[
  {
    "height" => 511,
    "type" => "primary",
    "uri" => "",
    "uri150" => "",
    "width" => 600
  },
  {
    "height" => 519,
    "type" => "secondary",
    "uri" => "",
    "uri150" => "",
    "width" => 600
  },
  {
    "height" => 521,
    "type" => "secondary",
    "uri" => "",
    "uri150" => "",
    "width" => 600
  },
  {
    "height" => 217,
    "type" => "secondary",
    "uri" => "",
    "uri150" => "",
    "width" => 500
  },
  {
    "height" => 597,
    "type" => "secondary",
    "uri" => "",
    "uri150" => "",
    "width" => 600
  },
  {
    "height" => 89,
    "type" => "secondary",
    "uri" => "",
    "uri150" => "",
    "width" => 600
  }
]

推荐阅读