首页 > 解决方案 > 仅使用简单的 html dom 找到第一个表

问题描述

所以我使用简单的 html dom来读取我表中的所有 id。

我的表如下所示:

<table><a name="Module-277409-Start_5f7bad2c-10af-4c88-afaf-6c960be2f547"></a><a name="Module-277409-Start"></a><span class="text_class2"><span>ID</span></span></td><td class="table_class4DeffCell cell_class9 cell_class8"><span class="text_class2"><span>Primary Text</span></span></td><td class="table_class4DeffCell cell_class11 cell_class10"><span class="text_class2"><span>SystemFeatures</span></span></td><td class="table_class4DeffCell cell_class13 cell_class12"><span class="text_class2"><span>Area of Relevance</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class16" href="">1</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><h1 class="paragraph_class17 1"><span>XXXXX</span></h1></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>XXXX</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>HEADING</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class18" href="">2</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><h2 class="paragraph_class19 2"><span>XXXXXX</span></h2></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>XXXX</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>XXXX</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class20" href="">3</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><p class="paragraph_class21"><span class="paragraph_class21 text_class22">xxxxx</span></p></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>XXXXXX</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>SW</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class23" href="">4</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><p class="paragraph_class21"><span class="paragraph_class21 text_class22">XXXXX</span></p></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>xxxxx</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>SW</span></span></td></tr>
</table>
<table><a name="Module-277409-Start_5f7bad2c-10af-4c88-afaf-6c960be2f547"></a><a name="Module-277409-Start"></a><span class="text_class2"><span>ID</span></span></td><td class="table_class4DeffCell cell_class9 cell_class8"><span class="text_class2"><span>Primary Text</span></span></td><td class="table_class4DeffCell cell_class11 cell_class10"><span class="text_class2"><span>SystemFeatures</span></span></td><td class="table_class4DeffCell cell_class13 cell_class12"><span class="text_class2"><span>Area of Relevance</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class16" href="">1</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><h1 class="paragraph_class17 1"><span>XXXXX</span></h1></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>XXXX</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>HEADING</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class18" href="">2</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><h2 class="paragraph_class19 2"><span>XXXXXX</span></h2></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>XXXX</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>XXXX</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class20" href="">3</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><p class="paragraph_class21"><span class="paragraph_class21 text_class22">xxxxx</span></p></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>XXXXXX</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>SW</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class23" href="">4</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><p class="paragraph_class21"><span class="paragraph_class21 text_class22">XXXXX</span></p></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>xxxxx</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>SW</span></span></td></tr>
</table>

这是我找到ID的代码:

$html = file_get_html('../Test/reqID/htmlfileID.html');

$table = $html->find('table');

foreach($table->find('tr') as $row) {
    if (is_numeric($row->find('td',0)->plaintext)) {
        $reqIDs[] = $row->find('td',0)->plaintext; 
    }         
} 

这里有一个在 chrome 中打开的 .html 的屏幕截图: 来自chrome的html

为什么我只能从第一个表中获取 Id 而不是其他表?我有大约 25 张带有 ID 的桌子。所有 Id 是每个表中的第一列,或者像我一样,找到每个表,然后在 td.d 中搜索第一个数值。

编辑:

多谢你们,

由于某种原因它现在几乎停在表 4 的末尾?我还有10张桌子吗?有很多Id。

这就是我的表格的外观,我得到了许多带有新表格的 div。 桌子

此外,当我执行 sizeof($table) 时,它说找到了 9 个表。但我确实有 30 个? 在此处输入图像描述

标签: phphtmlsimple-html-dom

解决方案


传递给find()方法的第二个参数$table = $html->find('table',0);告诉该方法只返回第一个找到的元素(第二个参数是结果数组中元素的索引),所以你基本上是在要求find方法只返回第一个表。为避免这种情况,您需要省略第二个参数。像这样的东西:

$html = file_get_html('../Test/reqID/htmlfileID.html');

// $tables will be an array of all found tables
$tables = $html->find('table');

// you will need to also loop over all tables
foreach ($tables as $table) {
    foreach($table->find('tr') as $row) {
        if (is_numeric($row->find('td',0)->plaintext)) {
            $reqIDs[] = $row->find('td',0)->plaintext; 
        }         
    } 
}

推荐阅读