首页 > 解决方案 > 从 TXT 文件生成 XML 格式

问题描述

我有下面的输入 txt 文件,我正在尝试生成下面的 XMl 文件。我正在尝试使用 awk 来实现,但我认为我正在重新发明轮子。你建议我怎么做?谢谢

输入 txt 文件(示例,此输入可能更大)

Usw 1:1 Desktop
Usw 1:2 Netbooks
Usw 1:3 Servers, mainframes and supercomputers
Usw 1:4 Smart devices
Usw 1:5 Embedded devices
Usw 1:6 Gaming
Usw 1:7 Specialized uses
Usw 2:1 Precursors
Usw 2:2 Creation
Usw 2:5 Naming
Usw 2:6 Commercial and popular uptake
Usw 2:9 Current development
Des 1:1 User interface
Des 1:2 Video input infrastructure
Des 1:3 Hardware
Des 2:1 Community
Des 2:2 Programming on Linux

需要xml文件

<?xml version="1.0" encoding="utf-8"?>

<XMLRT xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="SomeSchema.xsd" bename="The name" status="v" version="1.4" revision="1" type="x-rt">
<INTRO>
    <title>Some title</title>
    <creator>
    </creator>
    <subject>Some subject</subject>
    <description>Some description</description>
    <date>2010-05-12</date>
    <type>Some text</type>
</INTRO>
<RTBLOCK bname="Usw" bnumber="1" bsname="1U">
    <CTR cnumber="1">
    <ES vnumber="1">Desktop</ES>
    <ES vnumber="2">Netbooks</ES>
    <ES vnumber="3">SerES, mainframes and supercomputers</ES>
    <ES vnumber="4">Smart devices</ES>
    <ES vnumber="5">Embedded devices</ES>
    <ES vnumber="6">Gaming</ES>
    <ES vnumber="7">Specialized uses</ES>
    </CTR>
    <CTR cnumber="2">
    <ES vnumber="1">Precursors</ES>
    <ES vnumber="2">Creation</ES>
    <ES vnumber="5">Naming</ES>
    <ES vnumber="6">Commercial and popular uptake</ES>
    <ES vnumber="9">Current development</ES>
    </CTR>
</RTBLOCK>
<RTBLOCK bname="Des" bnumber="1" bsname="1D">
    <CTR cnumber="1">
    <ES vnumber="1">User interface</ES>
    <ES vnumber="2">Video input infrastructure</ES>
    <ES vnumber="3">Hardware</ES>
    </CTR>
    <CTR cnumber="2">
    <ES vnumber="1">Community</ES>
    <ES vnumber="2">Programming on Linux</ES>
    </CTR>
</RTBLOCK>
</XMLRT>

标签: xmlawkdata-conversion

解决方案


只是为了表明您不需要 XML 感知工具来生成任何特定目的所需的特定 XML,以下是您的示例的一种方法:

$ cat tst.awk
BEGIN {
    print    "<?xml version=\"1.0\" encoding=\"utf-8\"?>"
    print    ""
    print    "<XMLRT xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:noNamespaceSchemaLocation=\"SomeSchema.xsd\" bename=\"The name\" status=\"v\" version=\"1.4\" revision=\"1\" type=\"x-rt\">"
    print    "<INTRO>"
    print    "    <title>Some title</title>"
    print    "    <creator>"
    print    "    </creator>"
    print    "    <subject>Some subject</subject>"
    print    "    <description>Some description</description>"
    print    "    <date>2010-05-12</date>"
    print    "    <type>Some text</type>"
    print    "</INTRO>"

    rtBeg  = "<RTBLOCK bname=\"%s\" bnumber=\"1\" bsname=\"1%s\">\n"
    ctrBeg = "    <CTR cnumber=\"%d\">\n"
    esBody = "    <ES vnumber=\"%d\">%s</ES>\n"
    ctrEnd = "    </CTR>\n"
    rtEnd  = "</RTBLOCK>\n"
    xmlEnd = "</XMLRT>\n"
}
{
    bname = $1

    split($2,tmp,/:/)
    cnum = tmp[1]
    vnum = tmp[2]

    text = $0
    sub(/([^[:space:]]+[[:space:]]+){2}/,"",text)
}

bname != prevBname {
    if (prevCnum  != "") printf ctrEnd
    if (prevBname != "") printf rtEnd
    printf rtBeg, bname, substr(bname,1,1)
    prevCnum = ""
    prevBname = bname
}

cnum != prevCnum {
    if (prevCnum != "") printf ctrEnd
    printf ctrBeg, cnum
    prevCnum = cnum
}

{ printf esBody, vnum, text }

END {
    if (prevCnum  != "") printf ctrEnd
    if (prevBname != "") printf rtEnd
    printf xmlEnd
}

.

$ awk -f tst.awk file
<?xml version="1.0" encoding="utf-8"?>

<XMLRT xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="SomeSchema.xsd" bename="The name" status="v" version="1.4" revision="1" type="x-rt">
<INTRO>
    <title>Some title</title>
    <creator>
    </creator>
    <subject>Some subject</subject>
    <description>Some description</description>
    <date>2010-05-12</date>
    <type>Some text</type>
</INTRO>
<RTBLOCK bname="Usw" bnumber="1" bsname="1U">
    <CTR cnumber="1">
    <ES vnumber="1">Desktop</ES>
    <ES vnumber="2">Netbooks</ES>
    <ES vnumber="3">Servers, mainframes and supercomputers</ES>
    <ES vnumber="4">Smart devices</ES>
    <ES vnumber="5">Embedded devices</ES>
    <ES vnumber="6">Gaming</ES>
    <ES vnumber="7">Specialized uses</ES>
    </CTR>
    <CTR cnumber="2">
    <ES vnumber="1">Precursors</ES>
    <ES vnumber="2">Creation</ES>
    <ES vnumber="5">Naming</ES>
    <ES vnumber="6">Commercial and popular uptake</ES>
    <ES vnumber="9">Current development</ES>
    </CTR>
</RTBLOCK>
<RTBLOCK bname="Des" bnumber="1" bsname="1D">
    <CTR cnumber="1">
    <ES vnumber="1">User interface</ES>
    <ES vnumber="2">Video input infrastructure</ES>
    <ES vnumber="3">Hardware</ES>
    </CTR>
    <CTR cnumber="2">
    <ES vnumber="1">Community</ES>
    <ES vnumber="2">Programming on Linux</ES>
    </CTR>
</RTBLOCK>
</XMLRT>

以上将在任何 UNIX 机器上的任何 shell 中使用任何 POSIX awk 高效、健壮和可移植地工作。


推荐阅读