首页 > 解决方案 > 在beautifulsoup中,如何收集解析器中没有出现的照片链接?

问题描述

在 python 3 中,我想从某些页面上的照片中获取链接,例如:

http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/AC/10000600209

http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809

我这样做了:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error

html = urlopen('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/AC/10000600209')
soup = BeautifulSoup(html, "html.parser")
link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})
print(link)
None

html = urlopen('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809')
soup = BeautifulSoup(html, "html.parser")
link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})
print(link)
None

我打算收集照片旁边的一组项目并定义另一种策略来获取 src 的确切点。如:http ://divulgacandcontas.tse.jus.br/candidaturas/oficial/2018/BR/AC/2022802018/10000600209/foto_1532971768767.jpg

但是 Firefox 浏览器的 Inspect Element 中出现的内容(img class='img-thumbnail img-responsive dvg-cand-foto')与它收集的 html.parser 不同

请问,有谁知道我如何在网站上收集这个照片链接?

-/-

使用硒:

from selenium import webdriver
from selenium.common.exceptions import NoAlertPresentException
from selenium.webdriver.support.select import Select
from bs4 import BeautifulSoup

profile = webdriver.FirefoxProfile()
browser = webdriver.Firefox(profile)
browser.implicitly_wait(10)

browser.get('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809')

html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
browser.close()

link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})['src']

print(link)
http://divulgacandcontas.tse.jus.br/candidaturas/oficial/2018/BR/SP/2022802018/250000627809/foto_1534447872273.jpg

标签: pythonbeautifulsoupsrc

解决方案


from selenium import webdriver
from selenium.common.exceptions import NoAlertPresentException
from selenium.webdriver.support.select import Select
from bs4 import BeautifulSoup

profile = webdriver.FirefoxProfile()
browser = webdriver.Firefox(profile)
browser.implicitly_wait(10)

browser.get('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809')

html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
browser.close()

link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})['src']

print(link)
http://divulgacandcontas.tse.jus.br/candidaturas/oficial/2018/BR/SP/2022802018/250000627809/foto_1534447872273.jpg

推荐阅读