首页 > 解决方案 > 使用 cURL 和 simple_html_dom 抓取时服务器崩溃

问题描述

我有一个奇怪的问题。我有一个在 localhost 上运行良好的脚本,但是在服务器上运行它时,它会在几个循环后崩溃。该脚本使用cURLsimple_html_dom抓取网页。

这是代码的总和:

    class updateController extends Controller{
        function __construct(){
            ini_set('max_execution_time', 0);
            set_time_limit(0);
            require_once 'simple_html_dom.php';
        }
static public function ThemeforestLoopExisting(){
   $themes = Fulls::where('X','Y')->get();

   foreach($themes as $theme){
       $cURL = GeneralFunctions::cURLDom($theme['url']);
     //Here I search for specific parts on the web page using the "find" method on simple_html_dom
   }
}



  }

GeneralFunctions.php:

    static public function cURL_scraping($url){
        $curl = curl_init();
         curl_setopt($curl, CURLOPT_URL, $url);
         curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
         curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
         curl_setopt($curl, CURLOPT_MAXREDIRS, 10);
         curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
         curl_setopt($curl,CURLOPT_USERAGENT,'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A');
         curl_setopt($curl,CURLOPT_HTTPHEADER,array('Expect:'));
         curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, false );
         curl_setopt($curl, CURLOPT_ENCODING, 'identity');
        $response['str'] = curl_exec($curl);


        $response['header'] = curl_getinfo($curl, CURLINFO_HTTP_CODE);

        curl_close($curl);
        return $response;
    }

    static public function cURLDom($url){

  $cURL_results   = generalFunctions::cURL_scraping($url);
  $res['header']  = $cURL_results['header'];
  $res['str']  = str_get_html($cURL_results['str'],$lowercase=false, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=false, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT);
  return $res['str'];
}

整个过程适用于前 10/20/30 运行左右,然后服务器崩溃。它在本地主机上完美运行。我和我的虚拟主机谈过了,但他们没有帮助。

我在这里有什么遗漏或没有意识到的吗?任何帮助将不胜感激......谢谢!

标签: phpcurlscreen-scrapingsimple-html-dom

解决方案


这实际上是一个数据库问题。我将排序规则更改为utf8mb4_general_ci并修复了它。


推荐阅读