python - Python 请求抓取图像以“data:image/”格式返回 src
问题描述
我正在尝试从谷歌图片搜索结果中抓取第一张图片,因为我不想手动为 100 个关键字进行操作。
使用此代码:
from bs4 import BeautifulSoup
import requests
import json
query="koko"
url = "https://www.google.com/search?q=" + str(query) + "&source=lnms&tbm=isch"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
html = requests.get(url, headers=headers).text
soup = BeautifulSoup(html, 'html.parser')
images = soup.findAll("img")
images[0]
是<img alt="Koko, the gorilla who knew sign language, dies at 46 - Chicago Tribune" class="rg_i Q4LuWd" data-deferred="1" data-iid="0" height="157" jsname="Q4LuWd" src="data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" width="200"/>
返回的src
格式是我不想要的base64,我想要一个普通的图像链接。
如果我在我的 chrome 浏览器上禁用 javascript 并导航到https://www.google.com/search?q=koko&source=lnms&tbm=isch
并查看源代码,则img
返回的 src 是我需要的正常格式。
我无法使requests
html 与禁用的 javascript chrome 相同。
我尝试更改我的用户代理并尝试将我与 chrome 相同的用户代理匹配,但它不会改变结果。
解决方案
要获取所有图像,请设置content-type
header
:
from bs4 import BeautifulSoup
import requests
query = "koko"
url = "https://www.google.com/search?q=" + str(query) + "&source=lnms&tbm=isch"
HEADERS = {"content-type": "image/png"}
html = requests.get(url, headers=HEADERS).text
soup = BeautifulSoup(html, "html.parser")
for img in soup.find_all("img"):
print(img["src"])
输出:
/images/branding/searchlogo/1x/googlelogo_desk_heirloom_color_150x55dp.gif
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSYjINUgXtYyrUB4fKyaVxXCAkSyc_Q5b0QaeohUxmjdiIQwS_9CPXgWCXrUGQ&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcR1UnMwOo_8tpFkm04yby_I0HdMbfh6-GnhVWnKhOF1qnSP4ogODEn3AAo7V0M&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSHwKA_l2i20z0yeGMr_imQcB-tffAfL0xcQAKmbFn1-NtVrHn8AtTv9aql2Q&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcR6UNEOYT2BwMVrjXo8WW6CS0rUHC0QLIqA-GdO1CLGk7mxw8lhWgMyI-uW4A&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcT0dQsIKidzCcvdpvL0FDIfZ4Q3WL8GUKCCbwnK4V7FJ6nCGDVNbFmhnD7eOJ8&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSeeYoW5maZW69VamkrN_vzjQoxIQl-RFrcZK58rCry1ZDpyIT6FVaG1IFsKw&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcS3wy4vKh6ey8SAZHRxe-sKa1LEiBBdk6cbjELSGkoQn1YINb_YZSRanpOzR38&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSnw0tBokCloEzt0QDpnTVvJYJr1ZDngx7Znz6nLCbjZbq2Vn3g57iEUKordQ&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTZsq7Dy3-bT8miOPD_GE8_1X3isDl67A1ucNauliVlV4dIWgqleLY1OFyLjw&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQsjLjEmJ2kFrdoiU1O0CE_d2bazVxl4IPaHJy2Ea_PhI-B0_4jXcDcuLo2PQ&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRlvVOp05edZGkjz6q3QN8vqPsC-h-lIRlFyU16wYefNRG3zVlFQ2XeJRH3mMU&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcT1QKRBEW1WOZs-bS15vTjzYutHLYNIis6Ji60bcJ_mXvA1tYjYYrD-Nk9cWMc&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQiqRS7ry4rNx8VNA4F6TUmm_ZaTtcp4iXokZF_WT-M7zEkF9YG7PpWKpPhSg&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSNtY7c7Qg-w9wXmKfhSHrop5b4tb2wCQoK5pLj_RA1eCPXAn4TNNtEVA8RG_U&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTH8zHuDssfuFW0PUpqNnQoG0yTkebQ194uy7auEzzodGuSAYqsF8flYTW3VAE&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSDowATaKwsMkiN1aQj9e6J2VfMUm6742KW3ifxqddk4UHWSX-WOWDeTDSi_w&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTaCZKWiYg2tEUNerLa1zcmUD25-ZVC0RCDY1E1iby3PnHIJOY7cFhTZd8Em8M&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTfg8euHcq0wcUrtIHxleulXlTzbuehiZBb1DgJTEs3GdiG5l5bTdRt0Ug-Qg&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRPbAOCCA3diC-W5CtqbmpegeWPw-ReQPxBDaHN2YPH6OIqWC16dj5uNbhXhw&s
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSnrICqNqL_KG42rZ2_B7nKdZr-INrqsdZqfzeAbFrJYsBez0GDvKtIrwJjP5U&s
推荐阅读
- python - 如何创建免费送货功能?
- javascript - 使用单独的输入标签拖放多个文件
- swift - 将参数值传递给应用程序/播放商店下载的链接
- python - 每次我调用它时都会从“源模块”中获取一个函数吗?
- javascript - React.js 可以 onChange 合成事件中断垃圾收集吗?
- javascript - 来自另一个“数据”列的 Javascript 链接名称
- python - 当学习率降低时增加 epochs 数
- r - 如何从单词列表中提取第一个单词?
- mapbox - 如何将过滤器添加到 Mapbox 地图?
- docker - Docker 容器中的 SocketIO 客户端