首页 > 解决方案 > 将简单的 HTML 表解析为 php 数组的 Xpath 循环问题

问题描述

所以我之前的问题: PHP Convert html table to JSON 很快被视为重复而被驳回,我仍在努力获得我需要的东西。我认为这主要是循环中的逻辑问题,我需要其他人来看看它。

以这张表为例:

<table id="Details" class="DATA_TABLE DATA_TABLE_WO_TOTAL">
  <tr>
    <th>Application</th>
    <th>Version number</th>
    <th>Virtual Administration Server</th>
    <th>Group</th>
    <th>Device</th>
    <th>Installed</th>
    <th>Last visible time</th>
    <th>Last connection to Administration Server</th>
    <th>IP address</th>
  </tr>
  <tr>
    <td class="sD">some text</td>
    <td class="sD">10.2.5.3201</td>
    <td class="sD"></td>
    <td class="sD">Thin PC</td>
    <td class="sD">PC#</td>
    <td class="sD">date</td>
    <td class="sD">date</td>
    <td class="sD">date</td>
    <td class="sD">ip address</td>
  </tr>
  <tr>
     <tr>
    <td class="sD">some more text</td>
    <td class="sD">10.2.5.3201</td>
    <td class="sD"></td>
    <td class="sD">Thin PC</td>
    <td class="sD">PC#</td>
    <td class="sD">date</td>
    <td class="sD">date</td>
    <td class="sD">date</td>
    <td class="sD">ip address</td>
  </tr>
</table>

我需要创建一个数组(我可以稍后将其转换为 json),其中 th 标签是键,然后彼此内部的所有 td 标签 tr 是与这些键对应的数据。我有以下 php 代码:

<?php
$dom = new DOMDocument;
$dom->loadHTML($cleantable2); //this is the table above
$xpath = new DOMXPath($dom);

foreach($xpath->query('//table/tr') as $tr){
        $tmp = [];
                foreach($xpath->query('//table/tr/th', $tr) as $th){
                        $key = $th->textContent;
                        foreach($xpath->query('td', $tr) as $td){
                                $tmp[$key] = trim($td->textContent);
                        }
                }
                $result[]=$tmp;
        }
var_dump($result);

?>

它确实得到了正确的键,但不是数据,示例输出:

  [89]=>
  array(9) {
    ["Application"]=>
    string(13) "192.168.6.104"
    ["Version number"]=>
    string(13) "192.168.6.104"
    ["Virtual Administration Server"]=>
    string(13) "192.168.6.104"
    ["Group"]=>
    string(13) "192.168.6.104"
    ["Device"]=>
    string(13) "192.168.6.104"
    ["Installed"]=>
    string(13) "192.168.6.104"
    ["Last visible time"]=>
    string(13) "192.168.6.104"
    ["Last connection to Administration Server"]=>
    string(13) "192.168.6.104"
    ["IP address"]=>
    string(13) "192.168.6.104"
  }

如您所见,它只获取每个密钥的 IP 地址,而不获取其余数据。我究竟做错了什么?有人可以帮忙,而不仅仅是将其视为重复项吗?试图解决这个问题超过一天,我很确定我的问题只是没有正确循环,但我没有看到它......

谢谢

标签: phphtmlxpath

解决方案


$strhtml='
<table id="Details" class="DATA_TABLE DATA_TABLE_WO_TOTAL">
  <tr>
    <th>Application</th>
    <th>Version number</th>
    <th>Virtual Administration Server</th>
    <th>Group</th>
    <th>Device</th>
    <th>Installed</th>
    <th>Last visible time</th>
    <th>Last connection to Administration Server</th>
    <th>IP address</th>
  </tr>
  <tr>
    <td class="sD">some text</td>
    <td class="sD">10.2.5.202</td>
    <td class="sD">Plato</td>
    <td class="sD">Thin PC</td>
    <td class="sD">PC#</td>
    <td class="sD">date a</td>
    <td class="sD">date b</td>
    <td class="sD">date c</td>
    <td class="sD">10.25.100.1</td>
  </tr>
  <tr>
     <tr>
    <td class="sD">some more text</td>
    <td class="sD">10.2.5.321</td>
    <td class="sD">Socrates</td>
    <td class="sD">Thick PC</td>
    <td class="sD">PC#</td>
    <td class="sD">date x</td>
    <td class="sD">date y</td>
    <td class="sD">date z</td>
    <td class="sD">10.25.100.2</td>
  </tr>
</table>';

鉴于上面的 html 片段,也许以下内容可以满足您的需求?评论应该有助于了解我做了什么

libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->loadHTML( $strhtml );
libxml_clear_errors();

$xp=new DOMXPath( $dom );
/* find the `th` elements */
$col = $xp->query( '//tr/th' );

/* temp arrays */
$tmp=$out=$keys=array();


if( $col->length > 0 ){
    /* get all headers as keys */
    foreach( $col as $node )$keys[]=$node->nodeValue;

    /* get all table cell data - store in single array */
    $col=$xp->query( '//tr/td[ @class="sD" ]' );
    foreach( $col as $node )$tmp[]=$node->nodeValue;

    /* split data into chunks according to number of columns */
    $rows=array_chunk( $tmp, count( $keys ) );

    /* combine keys and chunks */
    foreach( $rows as $row ){
        $tmp=array();
        foreach( $row as $i => $value ) $tmp[ $keys[ $i ] ]=$value;
        $out[]=$tmp;
    }

    echo json_encode( $out );
}

输出:

[
    {
        "Application":"some text",
        "Version number":"10.2.5.202",
        "Virtual Administration Server":"Plato",
        "Group":"Thin PC",
        "Device":"PC#",
        "Installed":"date a",
        "Last visible time":"date b",
        "Last connection to Administration Server":"date c",
        "IP address":"10.25.100.1"
    },
    {
        "Application":"some more text",
        "Version number":"10.2.5.321",
        "Virtual Administration Server":"Socrates",
        "Group":"Thick PC","Device":"PC#",
        "Installed":"date x",
        "Last visible time":"date y",
        "Last connection to Administration Server":"date z",
        "IP address":"10.25.100.2"
    }
]

推荐阅读