首页 > 解决方案 > 从 img 中提取 data-src 和 data-srcset

问题描述

我正在尝试从 php 中的许多图像字符串中获取data-srcdata-srcset属性。这两个属性都是可选的,这意味着可以有零、只有data-src、只有data-srcset或两者。我拥有的正则表达式是

<img(.*?)data-src=['\"](.*?)['\"].*?|(data-srcset=['\"](.*?)['\"])?\/>

我正在测试的字符串是:

<li class="blocks-gallery-item">
  <figure>
    <img data-src="http://localhost:3000/wp-content/uploads/2018/11/detektivhut.gif" alt="" data-id="1037" data-link="http://localhost:3000/detektivhut/" class="wp-image-1037"/>
  </figure>
</li>
<li class="blocks-gallery-item">
  <figure>
    <img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04828.png" alt="" data-id="948" data-link="http://localhost:3000/dsc04828-2/" class="wp-image-948" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04828.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
  </figure>
</li>
<li class="blocks-gallery-item">
  <figure>
    <img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04831.png" alt="" data-id="883" data-link="http://localhost:3000/2018/11/13/single-page-style-1/dsc04831-2/" class="wp-image-883" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04831.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
  </figure>
</li>

但是太贪心了。看这里:

https://regex101.com/r/vDQE3C/1

非常感谢任何帮助(也是合乎逻辑的)。

标签: phpregex

解决方案


不要使用正则表达式来解析 html 代码。最好像这样使用DOM解析器:

$html = <<< EOF
<li class="blocks-gallery-item">
  <figure>
    <img data-src="http://localhost:3000/wp-content/uploads/2018/11/detektivhut.gif" alt="" data-id="1037" data-link="http://localhost:3000/detektivhut/" class="wp-image-1037"/>
  </figure>
</li>
<li class="blocks-gallery-item">
  <figure>
    <img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04828.png" alt="" data-id="948" data-link="http://localhost:3000/dsc04828-2/" class="wp-image-948" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04828.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
  </figure>
</li>
<li class="blocks-gallery-item">
  <figure>
    <img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04831.png" alt="" data-id="883" data-link="http://localhost:3000/2018/11/13/single-page-style-1/dsc04831-2/" class="wp-image-883" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04831.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
  </figure>
</li>
EOF;

$xpath = new DOMXPath(@DOMDocument::loadHTML($html));
$images = $xpath->evaluate("//img");

foreach($images as $img){
   if (($el = $img->attributes->getNamedItem('data-src')) != null)
      echo 'data-src=' . $el->nodeValue . "\n";
   if (($el = $img->attributes->getNamedItem('data-srcset')) != null)
      echo 'data-srcset=' . $el->nodeValue . "\n";
}

输出:

data-src=http://localhost:3000/wp-content/uploads/2018/11/detektivhut.gif
data-src=http://localhost:3000/wp-content/uploads/2018/11/DSC04828.png
data-srcset=//localhost:3000/wp-content/uploads/2018/11/DSC04828.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-1000x1500.png 1000w
data-src=http://localhost:3000/wp-content/uploads/2018/11/DSC04831.png
data-srcset=//localhost:3000/wp-content/uploads/2018/11/DSC04831.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-1000x1500.png 1000w

推荐阅读