首页 > 解决方案 > 使用 DOMDocument 将表的内容添加到数组中

问题描述

我有$html

$html = '
<table id="myTable">
    <tbody>
        <tr>
            <td>08/20/18</td>
            <td> <a href="https://example.com/1a">Text 1 A</a> </td>
            <td> <a href="https://example.com/1b">Test 1 B</a> </td>
        </tr>
        <tr>
            <td>08/21/18</td>
            <td> <a href="https://example.com/2a">Text 2 A</a> </td>
            <td> <a href="https://example.com/2b">Test 2 B</a> </td>
        </tr>
    </tbody>
</table>
';

使用DOMDocument,我想将表的内容添加到多维中$array

$array = array(
    // tr 1
    array(
        array(
            'content' => '08/20/18'
        ),
        array(
            'content' => 'Text 1 A',
            'href' => 'https://example.com/1a'
        ),
        array(
            'content' => 'Text 1 B',
            'href' => 'https://example.com/1b'
        )
    ),
    // tr 2
    array(
        array(
            'content' => '08/21/18'
        ),
        array(
            'content' => 'Text 2 A',
            'href' => 'https://example.com/1a'
        ),
        array(
            'content' => 'Text 2 B',
            'href' => 'https://example.com/1b'
        )
    )
);

到目前为止我尝试过的

我设法获得了tableusing的内容xpath

// setup DOMDocument
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $html); 
$xpath = new DOMXPath($doc);
// target table using xpath
$results = $xpath->query("//*[@id='myTable']");

if ($results->length > 0) {
    var_dump($results->item(0));
    var_dump($results->item(0)->nodeValue);
}

测试一下。将每个内容放入的方法是tr什么$array

标签: phpxpathdomdocumentdomxpath

解决方案


<?php

$html = '
<table id="myTable">
    <tbody>
        <tr>
            <td>08/20/18</td>
            <td> <a href="https://example.com/1a">Text 1 A</a> </td>
            <td> <a href="https://example.com/1b">Test 1 B</a> </td>
        </tr>
        <tr>
            <td>08/21/18</td>
            <td> <a href="https://example.com/2a">Text 2 A</a> </td>
            <td> <a href="https://example.com/2b">Test 2 B</a> </td>
        </tr>
    </tbody>
</table>
';

$data = [];

$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $html);
$xpath = new DOMXPath($doc);
$trs = $xpath->query("//*[@id='myTable']/tbody/tr");
foreach ($trs as $i => $tr) {
    /** @var DOMElement $td */
    foreach ($tr->childNodes as $td) {
        if ($td instanceof DOMElement) {
            /** @var DOMElement $a */
            $row = [];
            foreach ($td->childNodes as $a) {
                /** @var DOMAttr $attribute */
                $row['content'] = $td->nodeValue;
                if ($a->hasAttributes()) {
                    foreach ($a->attributes as $attribute) {
                        $row[$attribute->name] = $attribute->value;
                    }

                }

            }
            $data[$i][] = $row;
        }
    }
}

var_dump($data);

推荐阅读