python - BeautifulSoup 找不到 Image Src 属性
问题描述
嗨,我一直在网上抓取Asos 时尚网站,我得到了所有元素,但在 8 日之后无法获取img
源属性img
。
该类img
由三个名称组成,或者名称可以属于?这有点可疑。
当我尝试查找所有img
标签时,我得到了一个非常不同的名称,第 9 个没有源属性img
我的代码:
from helium import*
import time
from bs4 import BeautifulSoup
s = start_firefox(f"https://www.asos.com/men/shoes-boots-trainers/boots/cat/?cid=5774¤tpricerange=15-400&nlid=mw|shoes|shop%20by%20product|boots&refine=attribute_1046:8222,8629,10808&sort=priceasc",headless =True)
time.sleep(5)
for x in range(1,2):
scroll_down(num_pixels=10000)
for x in range(1,3):
click("LOAD MORE")
time.sleep(5)
scroll_down(num_pixels=10000)
soup = BeautifulSoup(s.page_source,"lxml")
All = soup.find_all("article",class_="_2qG85dG")
kill_browser()
def img(s):
try:
return s.find("img",class_= "_2r9Zh0W")["src"]
except:
return s.find("img",class_="_2FC97Nq _2q4fCfJ _2r9Zh0W")['src']
for a in All:
print(img(a))
print()
输出:
//images.asos-media.com/products/asos-design-chelsea-boots-in-tan-faux-suede/12550524-1-tan?$n_480w$&wid=476&fit=constrain
//images.asos-media.com/products/asos-design-chelsea-boots-in-black-faux-suede/12550506-1-black?$n_480w$&wid=476&fit=constrain
//images.asos-media.com/products/asos-design-chelsea-boots-in-brown-suede-with-black-sole/14849004-1-brown?$n_480w$&wid=476&fit=constrain
//images.asos-media.com/products/asos-design-vegan-lace-up-boots-in-brown-faux-leather/12510724-1-brown?$n_480w$&wid=476&fit=constrain
//images.asos-media.com/products/asos-design-chelsea-boots-in-brown-leather-with-brown-sole/10278706-1-brown?$n_480w$&wid=476&fit=constrain
//images.asos-media.com/products/asos-design-cuban-heel-western-chelsea-boot-in-grey-faux-suede-with-square-toe-with-metal-cap/21031115-1-grey?$n_480w$&wid=476&fit=constrain
//images.asos-media.com/products/new-look-chelsea-boot-in-black-suede/21198040-1-black?$n_480w$&wid=476&fit=constrain
//images.asos-media.com/products/asos-design-wide-fit-chelsea-boots-in-black-faux-suede/12550515-1-black?$n_480w$&wid=476&fit=constrain
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-78-d9272492986c> in img(s)
9 try:
---> 10 return s.find("img",class_= "_2r9Zh0W")["src"]
11 except:
TypeError: 'NoneType' object is not subscriptable
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-79-51d9d651c40b> in <module>
3 #print(a.find("div",class_= "_3J74XsK").text.strip())
4 #print(price(a))
----> 5 print(img(a))
6 print()
<ipython-input-78-d9272492986c> in img(s)
10 return s.find("img",class_= "_2r9Zh0W")["src"]
11 except:
---> 12 return s.find("img",class_="_2FC97Nq _2q4fCfJ _2r9Zh0W")["src"]
13
14
TypeError: 'NoneType' object is not subscriptable
解决方案
怎么了?
图像以惰性模式加载,这意味着如果它们进入视野。这就是为什么你只能得到src
前 8 个的原因。
对于尚未加载的图像,您将获得以下信息:
<img alt="" class="_1Jj-2sd" data-auto-id="productTileEmptyImage"/>
怎么修?
不要一步一步滚动整个方式,做更小的步骤并等待图像加载:
for x in range(1,6):
scroll_down(num_pixels=1800)
time.sleep(3)
我还认为通过它的数据属性而不是它的类/类来选择图像会更好/更清楚:
if a.find('img', {'data-auto-id':'productTileImage'}):
print(a.find('img', {'data-auto-id':'productTileImage'})['src'])
else:
print(a.img)
例子
from helium import*
import time
from bs4 import BeautifulSoup
s = start_firefox(f"https://www.asos.com/men/shoes-boots-trainers/boots/cat/?cid=5774¤tpricerange=15-400&nlid=mw|shoes|shop%20by%20product|boots&refine=attribute_1046:8222,8629,10808&sort=priceasc",headless =False)
time.sleep(2)
for t in range(1,4):
time.sleep(2)
for x in range(1,6):
scroll_down(num_pixels=2000)
time.sleep(3)
try:
click(Link('Load more'))
except:
continue
soup = BeautifulSoup(s.page_source,'lxml')
for a in soup.find_all("article",{'data-auto-id':'productTile'}):
if a.find('img', {'data-auto-id':'productTileImage'}):
print(a.find('img', {'data-auto-id':'productTileImage'})['src'])
else:
print(a.img)
推荐阅读
- macos - 无法在 macOS 10.12 中导入 confluent_kafka?
- julia - 如何打印错误的完整堆栈跟踪?
- android - 在 playconsole 中取消挂起的发布
- java - 编写一个只有 1 个数组的 Java 快速排序函数
- python - 为什么我的 Python 装饰器有时是“str”类型的?
- google-apps-script - 返回 Google Apps 脚本中的字符串长度
- python-3.x - 将 Google Cloud Vision API 与手动填写的表单一起使用
- java - 我在解决这个 java 问题时遇到了麻烦。看看你能不能帮忙
- amazon-web-services - AWS 无法创建 Fargate 配置文件
- angular - 如何使用密钥将登录凭据加密为 base64