首页 > 解决方案 > 使用 bs4 进行 Python 抓取会带来错误的输出

问题描述

我正在尝试从这个 html 代码中抓取 src。

from bs4 import BeautifulSoup


soup = BeautifulSoup(data.text, 'html.parser')
title = soup.find_all(attrs={'class': 'main-image-class'})[0].get('src')

但输出是 data:image/gif;base64。如何获取 src 链接?

网站代码:

<img src="https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwbf1e5118/images/bata/large-sport-shoe-8047751-0.jpg?sw=843" alt="NIKE AIR JORDAN 1 MID, GREEN" title="NIKE AIR JORDAN 1 MID, GREEN" class="main-image-class" srcset="https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwbf1e5118/images/bata/large-sport-shoe-8047751-0.jpg?sw=710 710w, https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwbf1e5118/images/bata/large-sport-shoe-8047751-0.jpg?sw=1420 710w, https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwbf1e5118/images/bata/large-sport-shoe-8047751-0.jpg?sw=556 556w, https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwbf1e5118/images/bata/large-sport-shoe-8047751-0.jpg?sw=1112 556w, https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwbf1e5118/images/bata/large-sport-shoe-8047751-0.jpg?sw=843 843w, https://en.aw-lab.com/dw/image/v2/BCLG_PRD/on/demandware.static/-/Sites-awlab-master-catalog/default/dwbf1e5118/images/bata/large-sport-shoe-8047751-0.jpg?sw=1686 843w" sizes="(max-width: 767px) 710px, (max-width: 1199px) and (min-width: 768px) 556px, (min-width: 1200px) 843px">

标签: pythonbeautifulsoupscreen-scraping

解决方案


推荐阅读