首页 > 解决方案 > 为什么与浏览网站相比,python requests.get() 检索到不同的图像 src

问题描述

正如标题所暗示的:调用该requests.get()方法会给我一个不同的图像src链接,而不是手动浏览网站时。

我正在尝试为产品抓取一个网站并想要存储图像,但src我从该网站获得的是一个非常低质量的模糊图像。我将src它与网站上的进行了比较,它是不同的。不确定我是否需要在请求中传递一些东西来“强制”屏幕尺寸?

我的代码如下:

from requests import get
from lxml import html

def demo():
    params = {'page': 0}
    response = get('https://www.checkers.co.za/c-2256/All-Departments', params=params)
    tree = html.fromstring(response.content)
    images = tree.xpath('//a[@class="product-listening-click"]/img[@src]')
    images = ['https://www.checkers.co.za' + image.attrib['src'] for image in images]
    print(images)

列表中第一个链接与站点上原始图像的 src 差异

site src:
https://www.checkers.co.za/medias/10136669EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3wzNTA4MHxpbWFnZS9wbmd8aW1hZ2VzL2g1My9oZmYvODg1NzQ3ODYyNzM1OC5wbmd8YTM4YjE3YmMxYzJjMzI4MmIzMTQ0ZWU1MjlkYjBmNWZjZGFhYzYxYzAyZGMyNDhlNDE0MDhjYWQ0MjQxNmQ3NA

retrieved src:
https://www.checkers.co.za/medias/lqi-10136669EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3wxNDkwfGltYWdlL3BuZ3xpbWFnZXMvaDY1L2gwNC85MDgxNzU4NDgyNDYyLnBuZ3w0MmY3ZmMzNzJmYTU0MGIzNDk0ZjdmOTkyODYwMGI3N2I5YWJhZDRkOTljNzViYjIxMWQ3OWU2NDVjZGZhZTdm

编辑1:

我尝试User-agent使用包添加标题fake-useragent并遍历所有可能的标题。src结果没有改变。

编辑2:

似乎解析它而lxml.html不是bs4为图像提供不同的输出data-original-src。不知道为什么,但感谢@AmineBTG 帮助注意到这个问题。

注意:使用with而不是
访问该技巧。测试时,它的外观不需要标头。data-original-srclxml.html//a[@class="product-listening-click"]/img/@data-original-src'//a[@class="product-listening-click"]/img[@data-original-src]'

标签: htmlpython-3.xpython-requestssrclxml.html

解决方案


它在传递适当的标头并检索“data-original-src”属性而不是“src”属性时起作用。请看下面的代码(稍作修改)

import requests
from bs4 import BeautifulSoup

def demo():
    header ={
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
        "accept-language": "fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7",
        "cache-control": "max-age=0",
        "sec-ch-ua": "\"Google Chrome\";v=\"87\", \" Not;A Brand\";v=\"99\", \"Chromium\";v=\"87\"",
        "sec-ch-ua-mobile": "?0",
        "sec-fetch-dest": "document",
        "sec-fetch-mode": "navigate",
        "sec-fetch-site": "none",
        "sec-fetch-user": "?1",
        "upgrade-insecure-requests": "1",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
    }

    params = {'page': 0}

    r = requests.get('https://www.checkers.co.za/c-2256/All-Departments', headers = header, params=params)
    s = BeautifulSoup(r.content, "html.parser")

    products = s.find_all("div", {"class":"item-product__image"})
    images = ['https://www.checkers.co.za' + prod.find("img").attrs.get("data-original-src") for prod in products]

    return images

print(demo())

输出:

['https://www.checkers.co.za/medias/10136669EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3wzNTA4MHxpbWFnZS9wbmd8aW1hZ2VzL2g1My9oZmYvODg1NzQ3ODYyNzM1OC5wbmd8YTM4YjE3YmMxYzJjMzI4MmIzMTQ0ZWU1MjlkYjBmNWZjZGFhYzYxYzAyZGMyNDhlNDE0MDhjYWQ0MjQxNmQ3NA', 'https://www.checkers.co.za/medias/10136301EAV2-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w1ODA4OHxpbWFnZS9wbmd8aW1hZ2VzL2g2ZS9oMGIvOTA5NjUxNTA5MjUxMC5wbmd8ZTViYzUzY2FiOWIyNWNmYmY0OGQ0ZGY0ZmY2ZDQwMGI3Nzk4ODMwOGYzMWRhNjIxOGZmM2Y1ZTExNDgxZWZjMg', 'https://www.checkers.co.za/medias/10151456EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w4NDE0NHxpbWFnZS9wbmd8aW1hZ2VzL2gzNC9oMjIvODg1NzgxMjU5ODgxNC5wbmd8OWMxYjE4Nzc0MjNkZTU2ZGI5ZDZmN2Q2M2FhMTdhZmM3Yjc4NDgwMzEwMjg1NmNiZTM1YWNjZjkxOTUyMzhmNQ', 'https://www.checkers.co.za/medias/10151458EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w1MTc0MnxpbWFnZS9wbmd8aW1hZ2VzL2g2MS9oYmEvODg1Nzg5Nzk5MjIyMi5wbmd8ZDFhNTlkMmJiNGY3ZDA3YjU5NjkwZGMxMjY3ZTgzMDVjYzFkMDkxNzI4NzlmN2U2MjhjZGJmYjE1NDg5ZDU2ZA', 'https://www.checkers.co.za/medias/10143000EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w0NjYyN3xpbWFnZS9wbmd8aW1hZ2VzL2g2MC9oZDIvODg1NzY3NDA4ODQ3OC5wbmd8ZDAzYzk5ZGFkY2Y1NjBiOTllZGJjOTVkZTUwNTg3NjBhYTM4NTk0OGFiYzk4OWRlNDQxZTdkNjQzNWM5YmU1NA', 'https://www.checkers.co.za/medias/10136298EAV2-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w1NzYzNXxpbWFnZS9wbmd8aW1hZ2VzL2g3MC9oZWEvOTA5NTI5NTY2NDE1OC5wbmd8NDVkZmY4YWU4NWY3MDliYjYyNTk4MWM1NzIyOTNlMjYwMzEwOGFiNGNiZTEzNGRhMmVkZjNiZTU0ZjNiYThiMw', 'https://www.checkers.co.za/medias/10151462EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w2MzY1MHxpbWFnZS9wbmd8aW1hZ2VzL2g4Zi9oMjcvODg1NzgxNDU2NDg5NC5wbmd8ZTE5ZmViZTdjNDNkOTVjMjZkODYwMjA4YTczNTgwNjU5ZmViMmE4OTQ4YzUwYjgwMjI0ZGJkNzJkNTI5OGU0Mg', 'https://www.checkers.co.za/medias/10241929EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w2MjMwOXxpbWFnZS9wbmd8aW1hZ2VzL2g0NS9oY2QvODg1ODQ2ODM4NDc5OC5wbmd8YzE1MGZkOWI2MjAyOWVlNzQ2YmRkMWM2MDNhZTk3ZGFkYWY4ZWMxNzA5Njc5OTMxNzY3OWEyNzg5MzczZmM1ZA', 'https://www.checkers.co.za/medias/10165121EAV2-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w2NTUzMXxpbWFnZS9wbmd8aW1hZ2VzL2gyZC9oYjcvODg2NDkyODM2NjYyMi5wbmd8NzEyODllNDlmZjE1NjJmMzEyMmU4MTU4NWQ4ZjRjYmM1Nzc2NWNjM2Y2YmFmZGQ1N2Y5ZjFmOTY0ZjBkMGE1OQ', 'https://www.checkers.co.za/medias/10145817EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3wyMzU0MHxpbWFnZS9wbmd8aW1hZ2VzL2gwMy9oZDIvODg1Nzc0ODc5OTUxOC5wbmd8NGQxZjA0OWNkZTVjY2JmNTI2ZTdlOGIwZjFiMmE5MGFhYjQ2NjZhMDBiYWMyNzVhYjMxNTI4YTZjYWU3OWZhZA', 'https://www.checkers.co.za/medias/10151065EAv2-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w2MDkyNHxpbWFnZS9wbmd8aW1hZ2VzL2g2Ny9oMDEvOTI3ODc2MDE4OTk4Mi5wbmd8YWQ1N2E4Y2ZmNTQ3YzA1ZDdmODcyZDlmZTg4ZGUwZGJhOWQ3ZGNiNWI5ZmE4OWFmOTVkZDgyYjEzOTUyZjlhYg', 'https://www.checkers.co.za/medias/10126789EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3wxMDYzMDh8aW1hZ2UvcG5nfGltYWdlcy9oNzcvaDAzLzg4NjA4NzYzMDg1MTAucG5nfDVhMTE1MmE5YmMxNjE0OGZmM2IwOTcxMWQzYWIyY2IxOTU2MmY1M2M2N2MzZjc5ZDE2YWFmNGFiZjdiOTI1YzY', 'https://www.checkers.co.za/medias/10241933EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w2NDQ4NXxpbWFnZS9wbmd8aW1hZ2VzL2gzNy9oMTkvODg1ODQwMjg4MTU2Ni5wbmd8NDA5MDRlMDZkM2U3M2JiODUwNWVmYmE5YzM3NjQ3NDkxYTMzZmI0ZjY3OWFkNDZiODU2YjQ2NTRjOTQyNjI2MQ', 'https://www.checkers.co.za/medias/10147193EAV2-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w2MDk2M3xpbWFnZS9wbmd8aW1hZ2VzL2g4Zi9oNTUvODk1NjcwODIyNTA1NC5wbmd8YmFkMTgzZDFkNGRiY2ViMzU4MjNhMGY0ZmM1OTgyY2U4NTY5MGZiZWI5ZTMzNjE1ZTNkY2Q1YzAwY2JhZTgyYw', 'https://www.checkers.co.za/medias/10164636EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w0MDM4NnxpbWFnZS9wbmd8aW1hZ2VzL2gxMS9oYWIvODg1ODA5NTU4MzI2Mi5wbmd8YTYxY2ZkNjAzOTg2NGVmNGMxODVjNmRkNTAyYmYzOWM2ZDU0MzgyYzk3YjM2YWUzOTRkMGEwOWE0NmVjMDQ0NA', 'https://www.checkers.co.za/medias/10136574EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w1MzU0NXxpbWFnZS9wbmd8aW1hZ2VzL2hjMi9oNjEvODg1NzQzMjk4MTUzNC5wbmd8YTJmZGU0ZGVjNDU1NTIyMzU0NGM1ZTQzOTQ4OTUwNmEwN2I3MDc5MjliOWNkNmRlNTgzZDMxYjdkMmNjNTIyYw', 'https://www.checkers.co.za/medias/10604301EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w1OTUxN3xpbWFnZS9wbmd8aW1hZ2VzL2hiZi9oZDQvODg1OTg2MzY0NjIzOC5wbmd8MTE5ZTQ4NzJkMzhjMzczMDk0MTE4YTZhZTllMTFlYjBiMTUyY2IzNjIyMmM4NzFlODA0MTU1Yzg3ZWNkMjMyNQ', 'https://www.checkers.co.za/medias/10145422EA-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w3OTc0MHxpbWFnZS9wbmd8aW1hZ2VzL2gxYy9oNTIvODk2MjM0ODIyMDQ0Ni5wbmd8MDc5NjY1YzY2NGE0NDFiNWRiN2NkNWZkMWJlODg5MDhlOWUyZWNhNDEzMWJiZTQ3MjM5MDYyZjgzZWYyYWM2Mg', 'https://www.checkers.co.za/medias/10136291EA-20190726-Media-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w5NjM3NnxpbWFnZS9wbmd8aW1hZ2VzL2hjNi9oMzUvODg1NzQzMDU1NjcwMi5wbmd8ZDQwZjEzMmU5Y2JkMDhkNGM2MGQ4ZTc1MWY0Y2Q5YTJhZWI2YmM2YmY5YjNiYWEyZjQ0YWQ5ZDgyMmE3ZWE2YQ', 'https://www.checkers.co.za/medias/10148833EAV2-checkers300Wx300H?context=bWFzdGVyfGltYWdlc3w3NTU3NnxpbWFnZS9wbmd8aW1hZ2VzL2g1OC9oNzIvODk1NjY5NjQ2MTM0Mi5wbmd8N2YyZmMxMjA5ZjkxZjkzZWExN2E2MGE1ZTZiZjI0M2FkMDcxZTVlMzY0ZjAzOTRjMjAzNzRjYWQ5Yzk4NjZkNQ'] 

推荐阅读