首页 > 解决方案 > XPath 中的空格 (PHP)

问题描述

我正在尝试编写一个“机器人”来抓取论坛以获取统计信息。

这是我的代码:https ://pastebin.com/6zAaQ0fF

    <?php

$ch = curl_init();
$timeout = 0; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, 'http://m.jeuxvideo.com/forums/42-51-61922886-1-0-1-0-once-upon-time-in-hollywood.htm');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$file_contents = curl_exec($ch);
curl_close($ch);


$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($file_contents);

$xpath = new DOMXPath($dom);
$posts = $xpath->query("//div[@class='who-post']");//$elements = $xpath->query("/html/body/div[@id='yourTagIdHere']");
$dates = $xpath->query("//div[@class='date-post']");//$elements = $xpath->query("/html/body/div[@id='yourTagIdHere']");
$contenus = $xpath->query("//div[@class='contenu']");//$elements = $xpath->query("/html/body/div[@id='yourTagIdHere']");




foreach ($posts as $post) {
    $nodes = $post->childNodes;

    foreach ($nodes as $node) {
    $value = trim($node->nodeValue);
      echo $node->nodeValue;
      $tab[] = $node->nodeValue;



    }

}


foreach ($dates as $date) {

    $nodes = $date->childNodes;
    foreach ($nodes as $node) {
       echo trim($node->nodeValue);
    }

}

?>
<pre>
<?php 
print_r($tab);
?>
</pre>

我不明白为什么我在数组中收到一些空格,而它在使用 echo 函数时可以正常工作......

在此处输入图像描述 感谢您的帮助 !帮助

标签: phpxpath

解决方案


你可以得到<a>帖子的标签。

$posts = $xpath->query("//div[@class='who-post']/a");

此外,您不使用修剪值(在第一个循环中):

$value = trim($node->nodeValue);
$tab[] = $node->nodeValue;

改成:

$value = trim($node->nodeValue);
$tab[] = $value;

输出:

Array
(
    [0] => Thewiitcheur
    [1] => Thewiitcheur
    [2] => Shaq24
    [3] => Downy-down
    [4] => LosyCITY
    [5] => DanaAndrews
    [6] => Racouske
    [7] => Gnagngan
    [8] => harvey-specter
    [9] => frivyhotasmr
    [10] => Jowst
    [11] => Thewiitcheur
    [12] => ChibreCarnivore
    [13] => pseudobanni5678
    [14] => Chimpanzee
    [15] => EncoreBan25
    [16] => spagetthivolant
    [17] => Chimpanzee
    [18] => JeromeGerber
    [19] => chopsueys
)

推荐阅读