python - 在beautifulsoup中,如何收集解析器中没有出现的照片链接?
问题描述
在 python 3 中,我想从某些页面上的照片中获取链接,例如:
http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/AC/10000600209
http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809
我这样做了:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error
html = urlopen('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/AC/10000600209')
soup = BeautifulSoup(html, "html.parser")
link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})
print(link)
None
html = urlopen('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809')
soup = BeautifulSoup(html, "html.parser")
link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})
print(link)
None
我打算收集照片旁边的一组项目并定义另一种策略来获取 src 的确切点。如:http ://divulgacandcontas.tse.jus.br/candidaturas/oficial/2018/BR/AC/2022802018/10000600209/foto_1532971768767.jpg
但是 Firefox 浏览器的 Inspect Element 中出现的内容(img class='img-thumbnail img-responsive dvg-cand-foto')与它收集的 html.parser 不同
请问,有谁知道我如何在网站上收集这个照片链接?
-/-
使用硒:
from selenium import webdriver
from selenium.common.exceptions import NoAlertPresentException
from selenium.webdriver.support.select import Select
from bs4 import BeautifulSoup
profile = webdriver.FirefoxProfile()
browser = webdriver.Firefox(profile)
browser.implicitly_wait(10)
browser.get('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809')
html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
browser.close()
link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})['src']
print(link)
http://divulgacandcontas.tse.jus.br/candidaturas/oficial/2018/BR/SP/2022802018/250000627809/foto_1534447872273.jpg
解决方案
from selenium import webdriver
from selenium.common.exceptions import NoAlertPresentException
from selenium.webdriver.support.select import Select
from bs4 import BeautifulSoup
profile = webdriver.FirefoxProfile()
browser = webdriver.Firefox(profile)
browser.implicitly_wait(10)
browser.get('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809')
html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
browser.close()
link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})['src']
print(link)
http://divulgacandcontas.tse.jus.br/candidaturas/oficial/2018/BR/SP/2022802018/250000627809/foto_1534447872273.jpg
推荐阅读
- sql - 带有“R”字母或前面数字的正则表达式标志值
- mips - MIPS 中的双问题调度
- api - 如何在 RESTful 请求中从前端传递权限字符串以检查 API?
- jquery - 如何不在已经拥有它的元素上重新初始化 Select2?
- .htaccess - 将任何链接重定向到特定的 url (.htaccess)
- sql - 如何根据SQL中其他列的值选择值更大的行
- c++ - 函数在 C++ 中返回向量的两种方式之间的区别
- apache-kafka - 脱序列异常 Spring Kafka | 每次使用 ErrorDeserialiser 重新启动服务器时都会记录损坏消息
- flutter - Flutter web:使用just_audio时getTemporaryDirectory(path_provider)未捕获的异常
- angular - 使用茉莉弹珠进行单元测试