首页 > 解决方案 > 如何通过 Goutte 来处理图像?

问题描述

我试图在 booking.com 上抓取酒店的图像。如果我需要获取酒店的所有图片,我可以点击网页上的任何图片。下面是html内容

<a data-id="88516347" data-thumb-url="https://cf.bstatic.com/images/hotel/max500/885/88516347.jpg" href="https://cf.bstatic.com/images/hotel/max1024x768/885/88516347.jpg" target="_blank" class="bh-photo-grid-item bh-photo-grid-photo3 active-image " style="background-image: url(https://cf.bstatic.com/images/hotel/max500/885/88516347.jpg);" onclick="return false;" title="The sunrise or sunset as seen from the apartment or nearby ">
<img src="https://cf.bstatic.com/images/hotel/max500/885/88516347.jpg" class="hide" alt="The sunrise or sunset as seen from the apartment or nearby ">
</a>

如果我单击图像,它将打开一个带有幻灯片的模式。有谁知道如何通过 php Goutte 获取它?

我得到了一个关于这个的python示例。 https://github.com/basophobic/booking_scraper/blob/ec4382dd00970df0dab4b4df5d67143f9bbc2b21/web_scrapping.py

# click on the first image to open the image carousel
    driver.find_element_by_class_name('bh-photo-grid-item').click()

    # find the number of images for every hotel
    tmp1 = driver.find_element_by_class_name('bh-photo-modal-caption-left').text
    tmp = tmp1.split()
    img_number = int(tmp[2])
    print(img_number)

    accommodation_fields['images'] = list()
    # loop through every image to save the link
    for image in range(img_number-1):
        img_href = driver.find_element_by_class_name('bh-photo-modal-image-element').find_element_by_tag_name('img').get_attribute('src')
        #print(img_href)
        accommodation_fields['images'].append(img_href)
        driver.find_element_by_class_name('bh-photo-modal-image-element').click()
    print("Total images are: " + str(img_number-1))

但我不知道将其转换为 php 的 Goutte。谢谢你。问候。

在此处输入图像描述

标签: phpweb-scrapingscrapegoutte

解决方案


有一个类“.bh-photo-grid-item”,您可以使用它来获取房间的图像。请尝试以下代码并检查它是否适用于您的情况:

try {
    $crawler->filter('.bh-photo-grid-item')->each(function ($node) use ($row, &$images) {
        $images[] = $node->filter('a')->link()->getUri();
    });
} catch (InvalidArgumentException $e) {
    print_r($e);
}

推荐阅读