首页 > 解决方案 > 爬取json的更好方法

问题描述

我有一个功能:

function personnesDispo($date){
  $mnsDispos = mnsDispos();
  for($j=0;$j<sizeof($mnsDispos);$j++){
    if($mnsDispos[$j]['date'] == $date){
      $mnsToday[$date][] = $mnsDispos[$j]['id'];
    }
  }
  $final = array_values(array_unique($mnsToday[$date]));
  return $final;
}

这个函数,爬取一个 json 对象。一个大的 json 对象。该函数需要很长时间才能抓取 json 文件,以至于我大多遇到 408 错误..

这是我的 json 数据:https ://files.olivierlam.fr/json.txt

我找不到更好的方法让我的功能更快

我的第二个功能(我认为这是一个需要太长时间的功能):

function personneDispoToday($id,$date,$heure){
  $mnsDispos = mnsDispos();
  for($j=0;$j<sizeof($mnsDispos);$j++){
    if($mnsDispos[$j]['date'] == $date AND $mnsDispos[$j]['heure'] == $heure AND $mnsDispos[$j]['id'] == $id){
      $dispo = [
        'id' => $mnsDispos[$j]['id'],
        'dispo' => $mnsDispos[$j]['dispo']
      ];
      return $dispo;
      break;
    }
  }
}

此功能,用于显示 1 或 2。但与第一个功能几乎相似。我认为,将这两个功能合二为一可能会有所帮助

标签: phpjson

解决方案


我不明白为什么您的代码需要这么长时间,但也许有处理您没有向我们展示。

此代码需要不到一秒的时间来解码 JSON 字符串并循环遍历完整的示例 JSON 并捕获作为参数传递的日期的新数据数组

function personnesDispo(&$j_array, $date){
    $today = [];
    foreach($j_array as $obj) {
        if ( $obj->date == $date ){
            $today[$date][] = $obj;
        }
        $c++;
    }
    return $today;
}

$startTime = microtime(true);

$j_array = json_decode($s); // $s was your sample JSON String

$result = personnesDispo($j_array, '25-05-2019');
print_r($result);

$endTime = microtime(true);
$execution_time = ($endTime - $startTime)/60;
echo '<b>Total Execution Time:</b> '.$execution_time.' Mins'.PHP_EOL;
echo 'Array objects of size = ' . count($j_array);

结果是

Array
(
    [25-05-2019] => Array
        (
            [0] => stdClass Object
                (
                    [prenom] => Guillaume
                    [nom] => HUART
                    [date] => 25-05-2019
                    [heure] => 06:00:00
                    [dispo] => 1
                    [id] => 1
                )

            [1] => stdClass Object
                (
                    [prenom] => Guillaume
                    [nom] => HUART
                    [date] => 25-05-2019
                    [heure] => 07:00:00
                    [dispo] => 1
                    [id] => 1
                )

            [2] => stdClass Object
                (
                    [prenom] => Guillaume
                    [nom] => HUART
                    [date] => 25-05-2019
                    [heure] => 08:00:00
                    [dispo] => 1
                    [id] => 1
                )

    ... lots more

           [111] => stdClass Object
                (
                    [prenom] => Charly
                    [nom] => PLAIGNAUD
                    [date] => 25-05-2019
                    [heure] => 23:00:00
                    [dispo] => 1
                    [id] => 51
                )

            [112] => stdClass Object
                (
                    [prenom] => Charly
                    [nom] => PLAIGNAUD
                    [date] => 25-05-2019
                    [heure] => 24:00:00
                    [dispo] => 1
                    [id] => 51
                )

        )

)
<b>Total Execution Time:</b> 0.00014141400655111 Mins
Array objects of size = 2244

即使使用稍微重写的第二个函数版本,当 JSON 数据作为参数传递给每个函数而不是在每个函数内部重新构建时,时间也不会发生太大变化

function personnesDispo(&$j_array, $date){
    $today = [];
    foreach($j_array as $a) {

        if ( $a->date == $date ){
            $today[$date][] = $a;
        }
    }
    return $today;
}

function personneDispoToday(&$j_array,$id,$date,$heure){
    foreach($j_array as $a) {
        if($a->date == $date AND $a->heure == $heure AND $a->id == $id){
              $dispo = [
                'id' => $a->id,
                'dispo' => $a->dispo
              ];
              return $dispo;
              break;
        }
    }
}

$startTime = microtime(true);

$j_array = json_decode($s);

$result = personnesDispo($j_array, '25-05-2019');
print_r($result);

$result2 = personneDispoToday($j_array,51,'25-05-2019','24:00:00');
print_r($result2);


$endTime = microtime(true);
echo "Start Time $startTime" . PHP_EOL;
echo "End Time $endTime" . PHP_EOL;

$execution_time = ($endTime - $startTime)/60;

echo "Execution Time $execution_time" . PHP_EOL;

echo '<b>Total Execution Time:</b> '.$execution_time.' Mins'.PHP_EOL;
echo 'Array objects of size = ' . count($j_array);

结果

Same big array as before plus

Array
(
    [id] => 51
    [dispo] => 1
)
Start Time 1556899734.5787
End Time 1556899734.5871
Execution Time 0.00014118353525798
<b>Total Execution Time:</b> 0.00014118353525798 Mins
Array objects of size = 2244

推荐阅读