首页 > 解决方案 > 如何使用正则表达式过滤从网站收集的图像链接?

问题描述

我正在尝试抓取网站产品信息,但是当我此时获取图像链接时,我正在寻找一些不必要的图像链接在我获取图像链接时正在抓取。是否可以使用正则表达式或任何方法删除 url?

代码

<?php
    $ch = curl_init();
    $url = 'https://www.jbhifi.com.au/collections/computers-tablets/products.json';

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101 
    Firefox/19.0");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

    $result = curl_exec($ch);
    $products = json_decode($result,true);
    $decodes = array_unique($products);
    
    $val = '';
    foreach($decodes['products'] as $decode){
        foreach($decode['images'] as $list){
            $val .= $list['src']."<br>";
        }
    }
    
    preg_match_all('/[-a-z0-9_\/:.]+[^-0-9|Embedded|]\.(jpg)\?v=(.*)/i', $val, $link);
    
    foreach($link as $data){
        echo "<pre";
        print_r($data);
    }

?>

输出

https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Product-0-I.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Product-1-I.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Product-2-I.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Product-3-I.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Product-4-I.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Embedded-0-I_469f7427-72ac-4e53-a9d5-22aa75cab0ab.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Embedded-1-I-637232562116464314.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Embedded-2-I-637239451904172005_1d5986ba-482b-4476-9e62-519acff8d6ee.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Embedded-3-I-637324077909043788.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-0-I-637287628562987164.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-1-I-637287628268270786.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-2-I-637287628270458315.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-3-I-637287628346618995.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-4-I-637287628269989532.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-5-I-637287628492014484.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-6-I-637287628269364578.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-7-I-637287628268583312.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-8-I-637287628272199819.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-9-I-637287628269989532.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-10-I-637287628269520792.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-11-I-637287628269208317.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-12-I-637287628269520792.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-13-I-637287628463877466.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-14-I-637287628493108248.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-15-I-637287628522483291.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-16-I-637287628540145722.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-17-I-637287628550309519.jpg?v=1596775805    

标签: phpimagefilterpreg-match-all

解决方案


推荐阅读