首页 > 解决方案 > 解析HTML

类到变量

问题描述

我正在尝试解析没有任何类名的服务器监控页面。HTML 文件如下所示

<div style="float:left;margin-right:50px"><div>Server:VIP Owner</div><div>Server Role:ACTIVE</div><div>Server State:AVAILABLE</div><div>Network State:GY</div>

我如何将此 html 内容解析为变量,例如

$Server VIP Owner
$Server_Role Active
$Server_State Available

由于没有类名..我正在努力将其提取出来。

 $htmlcontent.ParsedHtml.getElementsByTagName('div') | ForEach-Object {
>>     New-Variable -Name $_.className -Value $_.textContent

标签: powershellpowershell-4.0

解决方案


虽然您只向我们展示了 HTML 的一小部分,但其中很可能还有更多<div>标签。

如果没有id属性或任何其他唯一标识您所追求的 div 的东西,您可以使用Where-Object子句来查找您要查找的部分。

尝试

$div = ($htmlcontent.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.InnerHTML -like '<div>Server Name:*' }).outerText

# if you're on PowerShell version < 7.1, you need to replace the (first) colons into equal signs
$result = $div -replace '(?<!:.*):', '=' | ConvertFrom-StringData

# for PowerShell 7.1, you can use the `-Delimiter` parameter
#$result = $div | ConvertFrom-StringData -Delimiter ':'

结果是一个像这样的哈希表:

Name                           Value
----                           -----
Server Name                    VIP Owner
Server State                   AVAILABLE
Server Role                    ACTIVE
Network State                  GY

当然,如果报告中有更多这些,您将不得不使用以下内容循环 div:

$result = ($htmlcontent.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.InnerHTML -like '<div>Server Name:*' }) | Foreach-Object {
    $_.outerText -replace '(?<!:.*):', '=' | ConvertFrom-StringData
}

好的,所以最初的问题没有显示我们正在处理的内容。
显然,您的 HTML 包含这样的 div:

  <div>=======================================</div>
  <div>Service Name:MysqlReplica</div>
  <div>Service Status:RUNNING</div>
  <div>Remarks:Change role completed in 1 ms</div>
  <div>=======================================</div>
  <div>Service Name:OCCAS</div>
  <div>Service Status:RUNNING</div>
  <div>Remarks:Change role completed in 30280 ms</div>

要处理这样的块,您需要一种完全不同的方法:

# create a List object to store the results
$result  = [System.Collections.Generic.List[object]]::new()
# create a temporary ordered dictionary to build the resulting items
$svcHash = [ordered]@{}

foreach ($div in $htmlcontent.ParsedHtml.getElementsByTagName('div')) {
    switch -Regex ($div.InnerText) {
        '^=+' { 
            if ($svcHash.Count) {
                # add the completed object to the list
                $result.Add([PsCustomObject]$svcHash)
                $svcHash = [ordered]@{}
            }
        }
        '^(Service .+|Remarks):' { 
            # split into the property Name and its value
            $name, $value = ($_ -split ':',2).Trim() 
            $svcHash[$name] = $value 
        }
    }
}
if ($svcHash.Count) {
    # if we have a final service block filled. This happens when no closing
    #   <div>=======================================</div>
    # was found in the HTML, we need to add that to our final array of PSObjects
    $result.Add([PsCustomObject]$svcHash)
}

# output on screen
$result | Format-Table -AutoSize

# output to CSV file
$result | Export-Csv -Path 'X:\services.csv' -NoTypeInformation

使用上面的例子在屏幕上输出:

Service Name Service Status Remarks                          
------------ -------------- -------                          
MysqlReplica RUNNING        Change role completed in 1 ms    
OCCAS        RUNNING        Change role completed in 30280 ms

推荐阅读