首页 > 解决方案 > Powershell XML 过滤器

问题描述

我正在尝试删除基于任何文件夹中的文件,该文件夹包含带有标签 Modality 包含 anyType="CT" 的 XML 文件,但很快就遇到了尝试按 XML 内容过滤的问题

我可以返回一些内容,但是一旦我尝试任何过滤或尝试深入研究内容,我就会得到一个空结果。

这是我可以查询的深度,并且仍然从 xml 文件返回内容

$xmlfile = get-Content .\7.86.7.7053.61.159438.472144765.1719.XML
$xmlfile.ArrayOfPublicXMLElement.PublicXMLElement.ElementName

一旦我尝试更深入地钻取,我就没有得到任何结果,例如

$xmlfile.ArrayOfPublicXMLElement.PublicXMLElement.Elementname |where {$_.name -eq "Modality"}
$xmlfile.ArrayOfPublicXMLElement.PublicXMLElement.Elementname |where {$_.name -eq "anyType"}
$xmlfile.ArrayOfPublicXMLElement.PublicXMLElement.Elementname |where {$_.name -eq "CT"}
$xmlfile.ArrayOfPublicXMLElement.PublicXMLElement | where {$_.name -eq "00080060"}

这是我试图过滤的 XML 的副本我假设它是由于 XML 文件的格式我遇到了很多困难,或者只是对 XML 格式的巨大误解,或者 powershell 如何与之交互?

<?xml version="1.0" encoding="utf-8"?>
<ArrayOfPublicXMLElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <PublicXMLElement>
    <ElementName>Acquisition Time</ElementName>
    <Tag>00080032</Tag>
    <VR>TM</VR>
    <ElementData>
      <anyType xsi:type="xsd:string">105343</anyType>
    </ElementData>
  </PublicXMLElement>    <ElementName>Accession Number</ElementName>
    <Tag>00080050</Tag>
    <VR>SH</VR>
    <ElementData>
      <anyType xsi:type="xsd:string" />
    </ElementData>
  </PublicXMLElement>
  <PublicXMLElement>
    <ElementName>Modality</ElementName>
    <Tag>00080060</Tag>
    <VR>CS</VR>
    <ElementData>
      <anyType xsi:type="xsd:string">CT</anyType>
    </ElementData>
  </PublicXMLElement>
  <PublicXMLElement>
    <ElementName>Station Name</ElementName>
    <Tag>00081010</Tag>
    <VR>SH</VR>
    <ElementData>
      <anyType xsi:type="xsd:string">M_Source</anyType>
    </ElementData>
  </PublicXMLElement>
  <PublicXMLElement>
    <ElementName>Rescale Slope</ElementName>
    <Tag>00281053</Tag>
    <VR>DS</VR>
    <ElementData>
      <anyType xsi:type="xsd:string">1.0</anyType>
    </ElementData>
  </PublicXMLElement>
</ArrayOfPublicXMLElement>

标签: xmlpowershellfiltering

解决方案


如果您所拥有的只是无效的 XML,并且如果我理解正确您希望删除所有这些文件,其中:

  • 有一个标签<ElementName>Modality</ElementName>
  • 有一个标签<ElementData>
  • 它又具有一个<anyType>包含值的标签CT

那么你将不得不求助于使用正则表达式。

$regex = '(?s)<ElementName>Modality</ElementName>.*<ElementData>\s*<anyType[^>]*>CT</anyType>'
Get-ChildItem -Path 'D:\Test' -Filter '*.xml' -File -Recurse | ForEach-Object {
    $content = Get-Content -Path $_.FullName -Raw
    if ($content -match $regex) {
        $_ | Remove-Item -Force -WhatIf  # see below
    }
}

-WhatIf如果您对代码将删除正确的文件以实际删除这些文件感到满意,请删除该开关。

正则表达式详细信息

(?s)                                    Dot matches line breaks
<ElementName>Modality</ElementName>     Match the character string “&lt;ElementName>Modality</ElementName>” literally
.                                       Match any single character
   *                                    Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
<ElementData>                           Match the character string “&lt;ElementData>” literally
\s                                      Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line)
   *                                    Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
<anyType                                Match the character string “&lt;anyType” literally
[^>]                                    Match any character that is NOT a “&gt;”
   *                                    Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
>CT</anyType>                           Match the character string “&gt;CT</anyType>” literally

推荐阅读