首页 > 解决方案 > 当内容以这种方式混合时如何从 php curl 中抓取 js 值

问题描述

在我的 php 中,我有这个

$ch = curl_init();                      
curl_setopt($ch, CURLOPT_URL, $url);
`$result = curl_exec($ch);` //consist of the full content returned from php curl

//从 $result,我想从 Javascript 数据中检索这 3 个值: $.ajax({

tracking: "PSIByN1JXRnQyQ2t0Y0lmMTkyZmRhZDQ1ODhmM2RjNyJ9",
timestamp: 1622805734,
hash: "8bb3c9cb42025cb49a19c2cd060c4b21"

像这样将它们分配给php变量

$data_tracking = PSIByN1JXRnQyQ2t0Y0lmMTkyZmRhZDQ1ODhmM2RjNyJ9
$data_timestamp = 1622805734
$data_hash = 8bb3c9cb42025cb49a19c2cd060c4b21

在这种情况下检索数据的方式面临挑战

这是从 $result = curl_exec($ch); 获得的完整内容。

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no, user-scalable=0">
<meta http-equiv="x-ua-compatible" content="ie=edge">

<title>hello</title>
<meta name="csrf-token" content="iJ5ISYftFCNpTaaNfgX1VaPPfYmBhoXbGM6LxEVp">
<link rel="search" type="application/opensearchdescription+xml" href="/opensearch.xml" />
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Open+Sans:300,400,600,700:latin">

<style type="text/css">
</style>

<script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
        (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
        m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-8s4113-1', 'auto');
    ga('send', 'pageview');
</script>
<!-- Facebook Pixel Code -->
<script>
  !function(f,b,e,v,n,t,s)
  {if(f.fbq)return;n=f.fbq=function(){n.callMethod?
  n.callMethod.apply(n,arguments):n.queue.push(arguments)};
  if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!0;n.version='2.0';
  n.queue=[];t=b.createElement(e);t.async=!0;
  t.src=v;s=b.getElementsByTagName(e)[0];
  s.parentNode.insertBefore(t,s)}(window, document,'script',
  'https://connect.facebook.net/en_US/fbevents.js');
  fbq('init', '10344570522916');
  fbq('track', 'PageView');
</script>
<noscript><img height="1" width="1" style="display:none"
  src="https://www.facebook.com/tr?id=1d445000522916&ev=PageView&noscript=1"
/></noscript>
<!-- End Facebook Pixel Code -->
        <style type="text/css">
            html, body {
              height: 100%;
            }

            /*body {
              display: -ms-flexbox;
              display: -webkit-box;
              display: flex;
              -ms-flex-align: center;
              -ms-flex-pack: center;
              -webkit-box-align: center;
              align-items: center;
              -webkit-box-pack: center;
              justify-content: center;
              padding-top: 40px;
              padding-bottom: 40px;
              background-color: #f5f5f5;
            }*/
        </style>
    </head>

    <body>

<div class="external-body">
        
    <content>

        <section class="detail">
            <div class="container">

                <div class="external-detail">
                    <img src="./img/empty.png" alt="Express" data-toggle="tooltip" title="Express">
                    <h5>#Detail</h5>
                    <div class="contact">
                        <li><span class="icon"><i class="fas fa-phone"></i></span> <a href="#" data-anchor-protocol="tel">Loading...</a></li>                       
                        <li><span class="icon"><i class="fas fa-envelope"></i></span> <a href="#" data-anchor-protocol="mailto">Loading...</a></li>                     <li><span class="icon"><i class="fas fa-globe"></i></span> <a href="http://www.poslaju.com.my/" target="_blank" rel="nofollow">http://www.poslaju.com.my/</a></li>                  </div>
                </div>

                <div id="result">
                    <div class="text-center status-pending">
                        <p class="status text-tight">pending</p>
                    </div>
                    <div class="list">
                        <p class="text-center my-4"><i class="fas fa-spinner fa-spin"></i> Please wait</p>
                    </div>
                </div>
            </div>
        </section>

    </content>

<script src="https://test.com/js/app.js"></script>

<script type="text/javascript">


$(document).ready(function(){
    
    var retry = 2, not_found = function(){
        $('.list').html('<div class="result-not-found"></div>');
    };

    $.ajax({
        url: '/api/getResult',
        method: 'POST',
        data: {
            tracking: "PSIByN1JXRnQyQ2t0Y0lmMTkyZmRhZDQ1ODhmM2RjNyJ9",
            timestamp: 1622805734,
            hash: "8bb3c9cb42025cb49a19c2cd060c4b21"
        },
        dataType: "json",
        success: function(data){

            if(data.error_code){
                                ga('send', {
                    hitType: 'event',
                    eventLabel: data.error_code
                });
                            }


            var html = '', result = '\n==================================\n';

            if(data['latest_time']){
                html += '<div class="alert alert-danger" role="alert"><i class="fa fa-warning"></i> result at '+data['latest_time']+'.</div>';
            }

            html += '<div class="text-center status-'+data['latest_status'].replace(/_/g, '')+'"><p class="status text-tight">'+data['latest_status'].replace(/_/g, ' ')+'</p></div><div class="list">';

            var faicon = {
                sponsored: 'heart',
                attemptfail: 'bolt',
                exception: 'exclamation',
                inforeceived: 'clipboard-list',
                intransit: 'circle'
            };



            copy_result += '==================================\n';

            $('#result').html(html);

            if(data['location']){

                html = '<div class="sidebox location"><h5>'+data['location']['name']+'</h5><div class="detail">';
                html += '<p><span class="icon"><i class="fas fa-map-marker"></i></span>'+data['location']['address']+'</p>'

                var array = [
                    { name: "tel", icon: "phone" },
                    { name: "mobile", icon: "mobile-alt" },
                    { name: "fax", icon: "fax" },
                    { name: "email", icon: "envelope" },
                    { name: "operation", icon: "clock" }
                ];

                for(x in array){
                    if(data['location'][array[x]['name']]){
                        if(array[x]['name'] == 'operation' && data['location']['operation_collection']){
                            array[x]['name'] = 'operation_collection';
                        }
                        html += '<p><span class="icon"><i class="fas fa-'+array[x]['icon']+'"></i></span>'+data['location'][array[x]['name']].split('/').join('<br>')+'</p>';
                    }
                }

                html += '</div>';
                $('#location').html(html);
            }


            new ClipboardJS('[data-copy-result]', {
                text: function(trigger) {
                    return copy_result;
                }
            });

            $('[data-copy-result], [data-copy-link]').show().tooltip({
                title: 'Copied',
                trigger: 'manual'
            }).click(function(event){
                event.preventDefault();
                var a = $(this);
                a.tooltip('show');
                setTimeout(function(){
                    a.tooltip('hide');
                }, 1500);
            });

        },
        error: function(data){

            ga('send', {
                hitType: 'event',
            });

            retry--;
            if(retry){
                $.ajax(this);
            }else{
                not_found();
            }
        }
    });
    
    
        ga('send', {
            hitType: 'event',
        });
    
});
</script>

    </body>
</html>

标签: phpregexcurlweb-scraping

解决方案


您需要使用正则表达式从 HTML 代码中提取此类信息。

这是学习这个强大工具的一个很好的开始。

在您的特定情况下,表达式相当简单,请参见下面的示例:

preg_match('/tracking: (.*)/', $result, $m);
if (isset($m[1])) {
    $data_tracking = $m[1];
}

preg_match('/timestamp: (.*)/', $result, $m);
if (isset($m[1])) {
    $data_timestamp = $m[1];
}

preg_match('/hash: (.*)/', $result, $m);
if (isset($m[1])) {
    $data_hash = $m[1];
}

推荐阅读