php - 如何使用正则表达式过滤从网站收集的图像链接?
问题描述
我正在尝试抓取网站产品信息,但是当我此时获取图像链接时,我正在寻找一些不必要的图像链接在我获取图像链接时正在抓取。是否可以使用正则表达式或任何方法删除 url?
代码
<?php
$ch = curl_init();
$url = 'https://www.jbhifi.com.au/collections/computers-tablets/products.json';
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101
Firefox/19.0");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
$products = json_decode($result,true);
$decodes = array_unique($products);
$val = '';
foreach($decodes['products'] as $decode){
foreach($decode['images'] as $list){
$val .= $list['src']."<br>";
}
}
preg_match_all('/[-a-z0-9_\/:.]+[^-0-9|Embedded|]\.(jpg)\?v=(.*)/i', $val, $link);
foreach($link as $data){
echo "<pre";
print_r($data);
}
?>
输出
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Product-0-I.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Product-1-I.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Product-2-I.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Product-3-I.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Product-4-I.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Embedded-0-I_469f7427-72ac-4e53-a9d5-22aa75cab0ab.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Embedded-1-I-637232562116464314.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Embedded-2-I-637239451904172005_1d5986ba-482b-4476-9e62-519acff8d6ee.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/408547-Embedded-3-I-637324077909043788.jpg?v=1596775087
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-0-I-637287628562987164.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-1-I-637287628268270786.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-2-I-637287628270458315.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-3-I-637287628346618995.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-4-I-637287628269989532.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-5-I-637287628492014484.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-6-I-637287628269364578.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-7-I-637287628268583312.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-8-I-637287628272199819.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-9-I-637287628269989532.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-10-I-637287628269520792.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-11-I-637287628269208317.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-12-I-637287628269520792.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-13-I-637287628463877466.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-14-I-637287628493108248.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-15-I-637287628522483291.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-16-I-637287628540145722.jpg?v=1596775805
https://cdn.shopify.com/s/files/1/0024/9803/5810/products/473671-Product-17-I-637287628550309519.jpg?v=1596775805
解决方案
推荐阅读
- mysql - 在 go 中将“SELECT *”列(多个)读入 [][]string
- linux - recv 是否总是返回相同数量的传递给发送的数据?
- objective-c - 内部错误 FIRAuthErrorCodeInternalError 17999
- java - 如果所有元素都以相同的顺序结束,为什么要调整大小?
- xamarin.forms - Xamarin.Forms AppCenter 当应用程序在后台或终止时推送
- django - Django迁移错误表已经存在
- sql-server - 另一个 AWS 账户使用节点 Js 或 Python 中的 Lambda 调用在第一个 AWS 账户中运行的 RDS(SQL Server)资源
- angular - 如何从 txt URL 获取文本?如何使用 Angular 7 中的 Filereader?
- ruby - 将 Hyperstack 组件/模型限制为仅某些特定页面工作所需的组件/模型?
- android - 使用多界面访问 android room DB 的最佳解决方案是什么?