首页 > 解决方案 > 从 XML 文件中提取属性值 (starlet)

问题描述

我有一个包含五个条目的小 xml 文件:

<?xml version="1.0" encoding="utf-8"?>
<posts>
  <row Id="4" PostTypeId="1" AcceptedAnswerId="7" CreationDate="2008-07-31T21:42:52.667" Score="630" ViewCount="42817" Body="&lt;p&gt;I want to use a track-bar to change a form's opacity.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;This is my code:&lt;/p&gt;&#xA;&#xA;&lt;pre&gt;&lt;code&gt;decimal trans = trackBar1.Value / 5000;&#xA;this.Opacity = trans;&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;&#xA;&lt;p&gt;When I build the application, it gives the following error:&lt;/p&gt;&#xA;&#xA;&lt;blockquote&gt;&#xA;  &lt;p&gt;Cannot implicitly convert type &lt;code&gt;'decimal'&lt;/code&gt; to &lt;code&gt;'double'&lt;/code&gt;&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&#xA;&lt;p&gt;I tried using &lt;code&gt;trans&lt;/code&gt; and &lt;code&gt;double&lt;/code&gt; but then the control doesn't work. This code worked fine in a past VB.NET project.&lt;/p&gt;&#xA;" OwnerUserId="8" LastEditorUserId="3641067" LastEditorDisplayName="Rich B" LastEditDate="2019-07-19T01:39:54.173" LastActivityDate="2019-07-19T01:39:54.173" Title="Convert Decimal to Double?" Tags="&lt;c#&gt;&lt;floating-point&gt;&lt;type-conversion&gt;&lt;double&gt;&lt;decimal&gt;" AnswerCount="13" CommentCount="2" FavoriteCount="48" CommunityOwnedDate="2012-10-31T16:42:47.213" />
  <row Id="6" PostTypeId="1" AcceptedAnswerId="31" CreationDate="2008-07-31T22:08:08.620" Score="281" ViewCount="18214" Body="&lt;p&gt;I have an absolutely positioned &lt;code&gt;div&lt;/code&gt; containing several children, one of which is a relatively positioned &lt;code&gt;div&lt;/code&gt;. When I use a &lt;code&gt;percentage-based width&lt;/code&gt; on the child &lt;code&gt;div&lt;/code&gt;, it collapses to &lt;code&gt;0 width&lt;/code&gt; on IE7, but not on Firefox or Safari.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;If I use &lt;code&gt;pixel width&lt;/code&gt;, it works. If the parent is relatively positioned, the percentage width on the child works.&lt;/p&gt;&#xA;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;Is there something I'm missing here?&lt;/li&gt;&#xA;&lt;li&gt;Is there an easy fix for this besides the &lt;code&gt;pixel-based width&lt;/code&gt; on the&#xA;child?&lt;/li&gt;&#xA;&lt;li&gt;Is there an area of the CSS specification that covers this?&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;" OwnerUserId="9" LastEditorUserId="3641067" LastEditorDisplayName="Rich B" LastEditDate="2019-07-19T01:43:04.077" LastActivityDate="2019-07-19T01:43:04.077" Title="Percentage width child element in absolutely positioned parent on Internet Explorer 7" Tags="&lt;html&gt;&lt;css&gt;&lt;internet-explorer-7&gt;" AnswerCount="6" CommentCount="0" FavoriteCount="10" />
  <row Id="7" PostTypeId="2" ParentId="4" CreationDate="2008-07-31T22:17:57.883" Score="425" Body="&lt;p&gt;An explicit cast to double like this isn't necessary:&lt;/p&gt;&#xA;&#xA;&lt;pre&gt;&lt;code&gt;double trans = (double) trackBar1.Value / 5000.0;&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;&#xA;&lt;p&gt;Identifying the constant as &lt;code&gt;5000.0&lt;/code&gt; (or as &lt;code&gt;5000d&lt;/code&gt;) is sufficient:&lt;/p&gt;&#xA;&#xA;&lt;pre&gt;&lt;code&gt;double trans = trackBar1.Value / 5000.0;&#xA;double trans = trackBar1.Value / 5000d;&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;" OwnerUserId="9" LastEditorUserId="4020527" LastEditDate="2017-12-16T05:06:57.613" LastActivityDate="2017-12-16T05:06:57.613" CommentCount="0" />
  <row Id="9" PostTypeId="1" AcceptedAnswerId="1404" CreationDate="2008-07-31T23:40:59.743" Score="1742" ViewCount="555183" Body="&lt;p&gt;Given a &lt;code&gt;DateTime&lt;/code&gt; representing a person's birthday, how do I calculate their age in years?  &lt;/p&gt;&#xA;" OwnerUserId="1" LastEditorUserId="3956566" LastEditorDisplayName="Rich B" LastEditDate="2018-04-21T17:48:14.477" LastActivityDate="2019-06-26T15:25:44.253" Title="How do I calculate someone's age in C#?" Tags="&lt;c#&gt;&lt;.net&gt;&lt;datetime&gt;" AnswerCount="63" CommentCount="5" FavoriteCount="436" CommunityOwnedDate="2011-08-16T19:40:43.080" />
  <row Id="11" PostTypeId="1" AcceptedAnswerId="1248" CreationDate="2008-07-31T23:55:37.967" Score="1444" ViewCount="149445" Body="&lt;p&gt;Given a specific &lt;code&gt;DateTime&lt;/code&gt; value, how do I display relative time, like:&lt;/p&gt;&#xA;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;2 hours ago&lt;/li&gt;&#xA;&lt;li&gt;3 days ago&lt;/li&gt;&#xA;&lt;li&gt;a month ago&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;" OwnerUserId="1" LastEditorUserId="6479704" LastEditorDisplayName="user2370523" LastEditDate="2017-06-04T15:51:19.780" LastActivityDate="2019-05-26T02:31:53.863" Title="Calculate relative time in C#" Tags="&lt;c#&gt;&lt;datetime&gt;&lt;time&gt;&lt;datediff&gt;&lt;relative-time-span&gt;" AnswerCount="37" CommentCount="3" FavoriteCount="539" CommunityOwnedDate="2009-09-04T13:15:59.820" />
</posts>

我有兴趣从 5 个条目中仅提取标题和正文。到目前为止,我能想到的最好的方法是使用 starlet:

xmlstarlet sel -T -t -m '/posts/row' -v "concat(@Body,'|', @Title)" -n very_small_posts.xml

very_small_posts.xml 是包含上述 5 个条目的 xml 文件。问题是输出中有 XML 标签,我也看到空行。如下所示(第一行的输出):

<p>I want to use a track-bar to change a form's opacity.</p>

<p>This is my code:</p>

<pre><code>decimal trans = trackBar1.Value / 5000;
this.Opacity = trans;
</code></pre>

<p>When I build the application, it gives the following error:</p>

<blockquote>
  <p>Cannot implicitly convert type <code>'decimal'</code> to <code>'double'</code></p>
</blockquote>

<p>I tried using <code>trans</code> and <code>double</code> but then the control doesn't work. This code worked fine in a past VB.NET project.</p>
|Convert Decimal to Double?

有没有办法清理它们?我只想看到没有任何标签的常规文本作为输出......并且可能在值之前带有属性名称。我不想<p>, </p>, <blockquote>在我的输出中看到 , .. 有什么建议么?

首选输出(第一行):

Body=I want to use a track-bar to change a form's opacity. This is my code: decimal trans = trackBar1.Value / 5000; this.Opacity = trans; When I build the application, it gives the following error: Cannot implicitly convert type 'decimal' to 'double'I tried using trans and double but then the control doesn't work. This code worked fine in a past VB.NET project.
Title=Convert Decimal to Double?

标签: xmlstarlet

解决方案


如果原始帖子中的输出样本(包含标签)在 XML 文件中,则很容易提取规范化的文本,因此要生成所需的输出并查询 XML 包装器:

xmlstarlet sel -t \
-e doc \
  -m 'posts/row' \
    -e Body -v '@Body' -b \
    -e Title -v '@Title' -b \
file.xml |
xmlstarlet unesc |
xmlstarlet sel --text -t -m '*/*' -v 'concat(name(),"=",normalize-space())' --nl

在哪里:

  • 1stsel创建容器元素@Body@Title内容
  • unesc&lt;转换 XML 字符实体,例如<
  • 2ndsel将二级元素提取为以元素名称为前缀的规范化文本

当然不建议使用unesc这种方式来修改文档结构,但在控制时它是可行的。

链接xmlstarlet用户指南。


推荐阅读