perl - 如何使用 XML::LibXML 解析 xml 字典
问题描述
如何使用 XML::LibXML 解析 xml 字典
至少我认为它被称为字典。我没有找到好的资源来描述:
- 这种 XML 格式
- XML::LibXML。Perldoc 甚至没有提到我在 StackOverflow 上找到的 findvalue 或 textContent。
我有这个xml:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Clippings</key>
<array>
<dict>
<key>Abbreviation</key>
<string>;adeb</string>
<key>Creation Date</key>
<string>2017-04-22T22:02:32Z</string>
<key>DateLastUsed</key>
<string>2021-05-21T13:53:20Z</string>
<key>Label</key>
<string></string>
<key>Modification Date</key>
<string>2018-04-30T18:16:36Z</string>
<key>Option</key>
<string></string>
<key>Plain Text</key>
<string>print STDERR "(debug ⌘) $_\n" for @⌘;</string>
<key>SortOrder</key>
<string>1</string>
<key>TotUsed</key>
<integer>579</integer>
<key>uuidString</key>
<string>96707AF9-E9C4-4770-A930-B6889C354243</string>
</dict>
<dict>
<key>Abbreviation</key>
<string>;ahtml</string>
<key>DateLastUsed</key>
<string>2016-03-09T15:50:43Z</string>
<key>Modification Date</key>
<string>2016-05-18T09:35:49Z</string>
<key>Option</key>
<string></string>
<key>Plain Text</key>
<string>print "(debug) Enter to continue:\n";
my $debug = <STDIN>;</string>
<key>Service</key>
<string></string>
<key>SortOrder</key>
<string>130</string>
<key>TotUsed</key>
<integer>1</integer>
<key>uuidString</key>
<string>E74E6ADC-7BE3-493B-AD1E-32729CAB2B77</string>
</dict>
</array>
<key>sortOrderForThisSet</key>
<string>0</string>
<key>theAbbsFilePath</key>
<string>/Users/user/path/to/file</string>
<key>unsavedChanges</key>
<false/>
<key>uuidCheckDone</key>
<string>yes</string>
</dict>
</plist>
到目前为止,尝试这个 perl:
#!/usr/bin/perl
use strict;
use warnings;
my $filename = '/Users/user/path/to/file';
use XML::LibXML;
# Parse the XML
my $xml = XML::LibXML->load_xml(location => $filename);
for my $entry ($xml->findnodes('//dict')) {
my $key = $entry->findvalue('@key');
my $value = $entry->textContent;
print "$key = $value";
}
value 将所有值一起打印,而 key 根本不打印任何内容。
解决方案
首先,让我们解决您绝对错误的说法,即“ Perldoc 甚至没有提到 findvalue 或 textContent ”。
文档节点记录在XML::LibXML::Document中。以下几乎是该文件所说的第一件事:
它继承了 DOM 规范中指定的XML::LibXML::Node的所有函数。
元素节点记录在XML::LibXML::Element中。以下是该文件说的第一件事:
该类继承自XML::LibXML::Node。此处未列出继承方法的文档。
两者findvalue
和textContent
都记录在XML::LibXML::Node中。
到实际问题。
设计 XML 模式的人被告知要使用 XML,但显然不理解 XML。即使模式对 JSON 等任意数据结构进行编码也是很糟糕的。设计它的人只打算在使用之前将数据转换为不同的格式。以下是这样做的:
use strict;
use warnings;
use feature qw( say state );
use Carp qw( croak );
use Types::Serialiser qw( );
use XML::LibXML qw( );
sub qname {
my ($node) = @_;
my $ns = $node->namespaceURI();
my $name = $node->nodeName();
return defined($ns) ? "{$ns}$name" : $name;
}
sub deserialize_array {
my ($array_node) = @_;
return [ map { deserialize_value($_) } $array_node->findnodes("*") ];
}
sub deserialize_dict {
my ($dict_node) = @_;
my $dict = {};
my @children = $dict_node->findnodes("*");
while (@children) {
my $key_node = shift(@children);
qname($key_node) eq "key"
or croak("Expected key");
my $val_node = shift(@children)
or croak("Expected value");
my $key = $key_node->textContent();
my $val = deserialize_value($val_node);
$dict->{$key} = $val;
}
return $dict;
}
sub deserialize_value {
my ($val_node) = @_;
state $deserializers = {
string => sub { $_[0]->textContent() },
integer => sub { 0 + $_[0]->textContent() },
real => sub { 0 + $_[0]->textContent() },
true => sub { $Types::Serialiser::true },
false => sub { $Types::Serialiser::false },
data => sub { croak("data values not currently supported"); },
date => sub { croak("date values not currently supported"); },
array => \&deserialize_array,
dict => \&deserialize_dict,
};
my $val_type = qname($val_node);
my $deserializer = $deserializers->{$val_type}
or croak("Unrecognized value type \"$val_type\"");
return $deserializer->($val_node);
}
sub deserialize_doc {
my ($doc) = @_;
my @children = $doc->documentElement->findnodes("*");
croak("Root element has too few children") if @children == 0;
croak("Root element has too many children") if @children > 1;
return deserialize_value($children[0]);
}
{
my $doc = XML::LibXML->load_xml( location => $ARGV[0] );
my $prop_list = deserialize_doc($doc);
...
}
如果你转储$prop_list
,它看起来像
$prop_list = {
"Clippings" => [
{
"Creation Date" => "2017-04-22T22:02:32Z",
"TotUsed" => 579,
"DateLastUsed" => "2021-05-21T13:53:20Z",
"Abbreviation" => ";adeb",
"Plain Text" => "print STDERR \"(debug \x{2318}) \$_\\n\" for \@\x{2318};",
"uuidString" => "96707AF9-E9C4-4770-A930-B6889C354243",
"Modification Date" => "2018-04-30T18:16:36Z",
"SortOrder" => 1,
"Label" => "",
"Option" => ""
},
{
"Option" => "",
"Modification Date" => "2016-05-18T09:35:49Z",
"SortOrder" => 130,
"uuidString" => "E74E6ADC-7BE3-493B-AD1E-32729CAB2B77",
"Plain Text" => "print \"(debug) Enter to continue:\\n\";\nmy \$debug = <STDIN>;",
"Abbreviation" => ";ahtml",
"Service" => "",
"TotUsed" => 1,
"DateLastUsed" => "2016-03-09T15:50:43Z"
}
],
"unsavedChanges" => $false,
"sortOrderForThisSet" => 0,
"uuidCheckDone" => "yes",
"theAbbsFilePath" => "/Users/user/path/to/file"
};
以上是使用获得的
use Data::Dumper;
local $Data::Dumper::Indent = 1;
local $Data::Dumper::Useqq = 1;
print(Data::Dumper->Dump(
[ $Types::Serialiser::true, $Types::Serialiser::false, $prop_list ],
[qw( true false prop_list )],
));
推荐阅读
- javascript - 单击不属于“关闭”按钮的单独按钮无论如何都会触发关闭事件
- c - 如何通过带有 c 的串行端口从带有 NMEA 协议的 GPS 模块获取数据
- node.js - 使用 next.js 和 express.js 以及使用会话的最佳实践是什么
- python - 断言 API 结果响应
- spring - 跨多个 WebSecurityConfigurerAdapter 共享 SecurityContextHolder
- c++ - 如何在 C++ 中使用 cin.get() 提供更多数字?
- java - 目标编辑器未在 Eclipse 中显示
- javascript - 如何锚定固定高度到a的右下角而不侵入上面的内容?
- postgresql - 如何从 postgresql 中的解释输出中确定仅索引扫描?
- html - 如何减少表格单元格内按钮的宽度?