php - 仅使用简单的 html dom 找到第一个表
问题描述
所以我使用简单的 html dom来读取我表中的所有 id。
我的表如下所示:
<table><a name="Module-277409-Start_5f7bad2c-10af-4c88-afaf-6c960be2f547"></a><a name="Module-277409-Start"></a><span class="text_class2"><span>ID</span></span></td><td class="table_class4DeffCell cell_class9 cell_class8"><span class="text_class2"><span>Primary Text</span></span></td><td class="table_class4DeffCell cell_class11 cell_class10"><span class="text_class2"><span>SystemFeatures</span></span></td><td class="table_class4DeffCell cell_class13 cell_class12"><span class="text_class2"><span>Area of Relevance</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class16" href="">1</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><h1 class="paragraph_class17 1"><span>XXXXX</span></h1></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>XXXX</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>HEADING</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class18" href="">2</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><h2 class="paragraph_class19 2"><span>XXXXXX</span></h2></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>XXXX</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>XXXX</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class20" href="">3</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><p class="paragraph_class21"><span class="paragraph_class21 text_class22">xxxxx</span></p></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>XXXXXX</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>SW</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class23" href="">4</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><p class="paragraph_class21"><span class="paragraph_class21 text_class22">XXXXX</span></p></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>xxxxx</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>SW</span></span></td></tr>
</table>
<table><a name="Module-277409-Start_5f7bad2c-10af-4c88-afaf-6c960be2f547"></a><a name="Module-277409-Start"></a><span class="text_class2"><span>ID</span></span></td><td class="table_class4DeffCell cell_class9 cell_class8"><span class="text_class2"><span>Primary Text</span></span></td><td class="table_class4DeffCell cell_class11 cell_class10"><span class="text_class2"><span>SystemFeatures</span></span></td><td class="table_class4DeffCell cell_class13 cell_class12"><span class="text_class2"><span>Area of Relevance</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class16" href="">1</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><h1 class="paragraph_class17 1"><span>XXXXX</span></h1></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>XXXX</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>HEADING</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class18" href="">2</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><h2 class="paragraph_class19 2"><span>XXXXXX</span></h2></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>XXXX</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>XXXX</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class20" href="">3</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><p class="paragraph_class21"><span class="paragraph_class21 text_class22">xxxxx</span></p></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>XXXXXX</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>SW</span></span></td></tr><tr class="row_class14"><td class="table_class4DeffCell cell_class7 cell_class15"><a class="hyperlink_class23" href="">4</a></td><td class="table_class4DeffCell cell_class9 cell_class15"><span class="text_class2"><p class="paragraph_class21"><span class="paragraph_class21 text_class22">XXXXX</span></p></span></td><td class="table_class4DeffCell cell_class11 cell_class15"><span class="text_class2"><span>xxxxx</span></span></td><td class="table_class4DeffCell cell_class13 cell_class15"><span class="text_class2"><span>SW</span></span></td></tr>
</table>
这是我找到ID的代码:
$html = file_get_html('../Test/reqID/htmlfileID.html');
$table = $html->find('table');
foreach($table->find('tr') as $row) {
if (is_numeric($row->find('td',0)->plaintext)) {
$reqIDs[] = $row->find('td',0)->plaintext;
}
}
这里有一个在 chrome 中打开的 .html 的屏幕截图:
为什么我只能从第一个表中获取 Id 而不是其他表?我有大约 25 张带有 ID 的桌子。所有 Id 是每个表中的第一列,或者像我一样,找到每个表,然后在 td.d 中搜索第一个数值。
编辑:
多谢你们,
由于某种原因它现在几乎停在表 4 的末尾?我还有10张桌子吗?有很多Id。
解决方案
传递给find()
方法的第二个参数$table = $html->find('table',0);
告诉该方法只返回第一个找到的元素(第二个参数是结果数组中元素的索引),所以你基本上是在要求find
方法只返回第一个表。为避免这种情况,您需要省略第二个参数。像这样的东西:
$html = file_get_html('../Test/reqID/htmlfileID.html');
// $tables will be an array of all found tables
$tables = $html->find('table');
// you will need to also loop over all tables
foreach ($tables as $table) {
foreach($table->find('tr') as $row) {
if (is_numeric($row->find('td',0)->plaintext)) {
$reqIDs[] = $row->find('td',0)->plaintext;
}
}
}
推荐阅读
- javascript - 服务器端渲染不渲染 UI 库
- html - CSS如何使包装器元素包含2个它的大小元素,1个离屏
- hash - 哈希表中不同Key值插入序列的数量
- elasticsearch - 用于登录 Kubernetes 的 ElasticSearch 无法工作
- http - 关于 http 标头中大小限制的一个奇怪问题
- azure-active-directory - 天蓝色活动目录应用程序注册显示错误的发布者域
- excel - Excel-VBA 从文本文件导入模块(无需信任中心)
- ssis - SSIS 2008R2 禁用在设计时从运行时存储回来的属性
- angular - Angular - 验证必填字段 - 为什么需要一个带有验证的 div 来包装错误消息 div?
- spring - 在 bootRun 任务中传递 spring.config.location