首页 > 解决方案 > requests_html 查找子元素内的所有 img 标签

问题描述

对于这些 div 中的每一个:

(例子)

<div class="khRVwd Y37F6d"><img id="rimg_sqVNYKK9FY7l-gSQlbP4CQ13" src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTqjFYOHXAUsawFW2-4Js3VPmPHKXlgS28gx_qY-elNz1XPso9OLSzibeaCd0Nu&amp;s=10" alt=""></div>

在用 javascript 渲染整个页面后,我试图从这个网站上抓取所有电影图像。我想要的例子:

<img id="rimg_sqVNYKK9FY7l-gSQlbP4CQ13" src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTqjFYOHXAUsawFW2-4Js3VPmPHKXlgS28gx_qY-elNz1XPso9OLSzibeaCd0Nu&amp;s=10" alt="">

<img id="rimg_sqVNYKK9FY7l-gSQlbP4CQ15" src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQNj-2DTTNT9flGEEOF2-KgGHuqD0pyRgZjoJ_1YMwDpBbPnVpXLsDD5GAo5gyb&amp;s=10" alt="" class="kUzFve CgpFtc">

我试过的:

from requests_html import HTMLSession

session = HTMLSession()

r = session.get("https://www.google.com/search?q=new+movies&oq=new+movies#wxpd=:true")
r.html.render(sleep=15)
print(r.html.find(".khRVwd"))

但我得到的输出是:

[<Element 'div' class=('khRVwd', 'Y37F6d')>, <Element 'div' class=('khRVwd', 'Y37F6d') 
style='height:162px'>, <Element 'div' class=('khRVwd', 'Y37F6d')>, <Element 'div' class=('khRVwd', 'Y37F6d') style='height:162px'>, <Element 'div' class=('khRVwd', 'Y37F6d')>, <Element 'div' class=('khRVwd', 'Y37F6d') style='height:162px'>, <Element 'div' class=('khRVwd', 'Y37F6d')>, <Element 'div' class=('khRVwd', 'Y37F6d') style='height:162px'>, <Element 'div' class=('khRVwd', 'Y37F6d')>, <Element 'div' class=('khRVwd', 'Y37F6d') style='height:162px'>, <Element 'div' class=('khRVwd', 'Y37F6d')>, <Element 'div' class=('khRVwd', 'Y37F6d') style='height:162px'>

这是我要捕获的元素的父元素。

我不确定如何从具有类的 div 搜索到该子 <img 标记

编辑(想通了):

from bs4 import BeautifulSoup
from requests_html import HTMLSession


session = HTMLSession()

r = session.get("https://www.google.com/search?q=new+movies&oq=new+movies#wxpd=:true")
r.html.render(sleep=6)
links = r.html.find(".khRVwd")
for link in links:
    list1 = link.find("img")
    for img in list1:
        print(img.attrs["src"])

r.close()
session.close()

输出:



...

标签: pythonpython-3.xpython-requests-html

解决方案


我想通了

from requests_html import HTMLSession


session = HTMLSession()

r = session.get("https://www.google.com/search?q=new+movies&oq=new+movies#wxpd=:true")
r.html.render(sleep=6)
links = r.html.find(".khRVwd")
for link in links:
    list1 = link.find("img")
    for img in list1:
        print(img.attrs["src"])

r.close()
session.close()

对于任何想知道的人,这可能会有所帮助。


推荐阅读