python - 使用 bs4 进行 Python 抓取会带来错误的输出
问题描述
我正在尝试从这个 html 代码中抓取 src。
from bs4 import BeautifulSoup
soup = BeautifulSoup(data.text, 'html.parser')
title = soup.find_all(attrs={'class': 'main-image-class'})[0].get('src')
但输出是 data:image/gif;base64。如何获取 src 链接?
网站代码:
<img src="https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwbf1e5118/images/bata/large-sport-shoe-8047751-0.jpg?sw=843" alt="NIKE AIR JORDAN 1 MID, GREEN" title="NIKE AIR JORDAN 1 MID, GREEN" class="main-image-class" srcset="https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwbf1e5118/images/bata/large-sport-shoe-8047751-0.jpg?sw=710 710w, https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwbf1e5118/images/bata/large-sport-shoe-8047751-0.jpg?sw=1420 710w, https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwbf1e5118/images/bata/large-sport-shoe-8047751-0.jpg?sw=556 556w, https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwbf1e5118/images/bata/large-sport-shoe-8047751-0.jpg?sw=1112 556w, https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwbf1e5118/images/bata/large-sport-shoe-8047751-0.jpg?sw=843 843w, https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwbf1e5118/images/bata/large-sport-shoe-8047751-0.jpg?sw=1686 843w" sizes="(max-width: 767px) 710px, (max-width: 1199px) and (min-width: 768px) 556px, (min-width: 1200px) 843px">
解决方案
推荐阅读
- html - 适合 HTML 表格中的内容和自动宽度
- python - django自动完成灯无法输入字符串
- plsqldeveloper - 如何恢复 PLSQL Developer 窗口?
- c# - 不在 ASP NET Core Razor Pages 上使用模型是错误的吗?
- bash - bash - 在不同的函数定义中重用代码片段
- r - 如何缩短或总结我的命令?
- ffmpeg - FFMPEG 无法生成具有关键间隔值的所有段
- database - Oracle GoldenGate 是否复制导入操作
- ansible - 如何使用 `ansible-vault` `--output` 将加密内容写入文件?
- c# - 奇怪的复选框@onchange 处理行为