php - XPath 中的空格 (PHP)
问题描述
我正在尝试编写一个“机器人”来抓取论坛以获取统计信息。
这是我的代码:https ://pastebin.com/6zAaQ0fF
<?php
$ch = curl_init();
$timeout = 0; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, 'http://m.jeuxvideo.com/forums/42-51-61922886-1-0-1-0-once-upon-time-in-hollywood.htm');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$file_contents = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($file_contents);
$xpath = new DOMXPath($dom);
$posts = $xpath->query("//div[@class='who-post']");//$elements = $xpath->query("/html/body/div[@id='yourTagIdHere']");
$dates = $xpath->query("//div[@class='date-post']");//$elements = $xpath->query("/html/body/div[@id='yourTagIdHere']");
$contenus = $xpath->query("//div[@class='contenu']");//$elements = $xpath->query("/html/body/div[@id='yourTagIdHere']");
foreach ($posts as $post) {
$nodes = $post->childNodes;
foreach ($nodes as $node) {
$value = trim($node->nodeValue);
echo $node->nodeValue;
$tab[] = $node->nodeValue;
}
}
foreach ($dates as $date) {
$nodes = $date->childNodes;
foreach ($nodes as $node) {
echo trim($node->nodeValue);
}
}
?>
<pre>
<?php
print_r($tab);
?>
</pre>
我不明白为什么我在数组中收到一些空格,而它在使用 echo 函数时可以正常工作......
解决方案
你可以得到<a>
帖子的标签。
$posts = $xpath->query("//div[@class='who-post']/a");
此外,您不使用修剪值(在第一个循环中):
$value = trim($node->nodeValue);
$tab[] = $node->nodeValue;
改成:
$value = trim($node->nodeValue);
$tab[] = $value;
输出:
Array
(
[0] => Thewiitcheur
[1] => Thewiitcheur
[2] => Shaq24
[3] => Downy-down
[4] => LosyCITY
[5] => DanaAndrews
[6] => Racouske
[7] => Gnagngan
[8] => harvey-specter
[9] => frivyhotasmr
[10] => Jowst
[11] => Thewiitcheur
[12] => ChibreCarnivore
[13] => pseudobanni5678
[14] => Chimpanzee
[15] => EncoreBan25
[16] => spagetthivolant
[17] => Chimpanzee
[18] => JeromeGerber
[19] => chopsueys
)
推荐阅读
- powershell - PowerShell Remove-Item -Force does not remove non-owned directories imported from CSV file
- php - 如何根据返回的数据为 WP_Query 生成多个 args 查询
- html - HTML CSS 按钮文本完全不居中
- flutter - ShowDialog 不会更改下拉列表
- email - 不希望 Postfix+Dovecot 拒绝“找不到收件人”邮件
- r - 绘图导出显示不完整的名称列
- python - OpenCv 摄像头窗口未打开,仅网络摄像头灯在发光
- javascript - 如何将滚动动画从 HTML 文件转换为 Vue?
- vuejs3 - Vuex 4改变具有相同值的初始状态不会触发观察者
- apache-spark - 如何通过 Spark 改进从 s3 读取(列出)25k 小文件