首页 > 解决方案 > 不能使用图像 ID 来使它们成为合格的图像链接

问题描述

我正在尝试使用请求模块从该网页中抓取所有图像链接。当我使用此链接时,我只能向上抓取图像链接,直到向下滚动时显示的其余内容。但是,如果我使用这个链接,我可以通过增加附加到链接的最后一个数字来获取所有图像 ID。问题是我不能重用这些 id 来使它们成为完整的图像链接。

我试过:

import requests
from bs4 import BeautifulSoup

url = 'https://stocksnap.io/api/search-photos/phone/relevance/desc/1'

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36'
    r = s.get(url)
    for item in r.json()['results']:
        print(item['img_id'])

如何从该网站的登录页面获取所有图片链接?

PS 前几个赞助图片链接应该被忽略,因为它们也不包含在 api 中。

标签: pythonpython-3.xweb-scrapingpython-requests

解决方案


检查页面,图像 URL 由 ID 和从 API 获得的前两个标签构成:

import requests


url = 'https://stocksnap.io/api/search-photos/phone/relevance/desc/{}'

page = 1
while True:
    data = requests.get(url.format(page)).json()

    if not data['results']:
        break

    for r in data['results']:
        print('https://stocksnap.io/photo/{}-{}-{}'.format(r['keywords'][0], r['keywords'][1], r['img_id']))

    page += 1

印刷:

...

https://stocksnap.io/photo/iphone-cellphone-LNXYMM77SS
https://stocksnap.io/photo/business-technology-OGLUHZAPGF
https://stocksnap.io/photo/samsung-android-7ZALGLUAAW
https://stocksnap.io/photo/apple-macbook-55A6840521
https://stocksnap.io/photo/woman-talking-54C3E9FE9D
https://stocksnap.io/photo/samsung-galaxy-BB3307280A
https://stocksnap.io/photo/parc-bench-3D99A31C0C
https://stocksnap.io/photo/iphone-cellphone-E2C541A7DC
https://stocksnap.io/photo/iphone-mockup-167A645BDC
https://stocksnap.io/photo/mac-keyboard-BA9AFFE0BF
https://stocksnap.io/photo/sony-android-EB939B3311
https://stocksnap.io/photo/iphone-cellphone-B962ABCAC7
https://stocksnap.io/photo/building-man-D49A8BB4AE
https://stocksnap.io/photo/technology-computer-C9B37875B9
https://stocksnap.io/photo/iphone-cellphone-381F0FD1EE
https://stocksnap.io/photo/work-bag-96E1A8F1CB
https://stocksnap.io/photo/iphone-phone-70FE8C00C9
https://stocksnap.io/photo/iphone-mockup-9FCDF4E1F5
https://stocksnap.io/photo/young-girl-BE8BA006E6
https://stocksnap.io/photo/young-girl-7174B21D56
https://stocksnap.io/photo/man-woman-6XELVX8KAN
https://stocksnap.io/photo/nexus-smartphones-UAXILBRNUL

编辑:要获取.jpg链接,同样的方法适用:

import requests


url = 'https://stocksnap.io/api/search-photos/phone/relevance/desc/{}'

page = 1
while True:
    data = requests.get(url.format(page)).json()

    if not data['results']:
        break

    for r in data['results']:
        print('https://cdn.stocksnap.io/img-thumbs/280h/{}-{}_{}.jpg'.format(r['keywords'][0], r['keywords'][1], r['img_id']))

    page += 1

印刷:

...

https://cdn.stocksnap.io/img-thumbs/280h/iphone-cellphone_B962ABCAC7.jpg
https://cdn.stocksnap.io/img-thumbs/280h/building-man_D49A8BB4AE.jpg
https://cdn.stocksnap.io/img-thumbs/280h/technology-computer_C9B37875B9.jpg
https://cdn.stocksnap.io/img-thumbs/280h/iphone-cellphone_381F0FD1EE.jpg
https://cdn.stocksnap.io/img-thumbs/280h/work-bag_96E1A8F1CB.jpg
https://cdn.stocksnap.io/img-thumbs/280h/iphone-phone_70FE8C00C9.jpg
https://cdn.stocksnap.io/img-thumbs/280h/iphone-mockup_9FCDF4E1F5.jpg
https://cdn.stocksnap.io/img-thumbs/280h/young-girl_BE8BA006E6.jpg
https://cdn.stocksnap.io/img-thumbs/280h/young-girl_7174B21D56.jpg
https://cdn.stocksnap.io/img-thumbs/280h/man-woman_6XELVX8KAN.jpg
https://cdn.stocksnap.io/img-thumbs/280h/nexus-smartphones_UAXILBRNUL.jpg

推荐阅读