首页 > 解决方案 > 抓取仅在单击 btn 时出现的 html 内容

问题描述

我正在尝试从以下网站抓取: https ://www.blockchain.com/btc/tx/800ce197af8a1a277ec314daba9c0b59c3ceee0f5beec415f5b8d54a3a9db96c

与以下类“sc-19pxzmk-0 lhmncg”相关的所有项目基本上是给定比特币交易中的所有地址,但正如您在页面右侧看到的那样,有一个元素:

<a class="sc-1r996ns-0 AqGqw sc-1tbyx6t-1 kXxRxe iklhnl-0 boNhIO" opacity="1">Load more outputs... (1 remaining)</a>

这样,如果您单击它会显示另一个地址,我该如何动态打开它?到目前为止我所尝试的是 -

import requests
from bs4 import BeautifulSoup
from selenium import webdriver


output_class = 'sc-19pxzmk-0 lhmncg'
driver = webdriver.Chrome()
driver.get('https://www.blockchain.com/btc/tx/800ce197af8a1a277ec314daba9c0b59c3ceee0f5beec415f5b8d54a3a9db96c')
result = driver.execute_script("return document.documentElement.outerHTML")

soup = BeautifulSoup(result, 'lxml')
element = driver.find_elements_by_class_name(output_class)
inputs = soup.find_all('div', {'class': output_class})

美丽的汤既不返回额外的地址,也不返回驱动程序。

标签: pythonweb-scrapingdynamic

解决方案


如果您使用 selenium,则无需使用 Beautifulsoup 来获取数据。用于element.click()直接单击元素并直接获取结果。

from selenium import webdriver

output_class = 'sc-19pxzmk-0 lhmncg'
driver = webdriver.Chrome()
driver.get('https://www.blockchain.com/btc/tx/800ce197af8a1a277ec314daba9c0b59c3ceee0f5beec415f5b8d54a3a9db96c')

driver.find_element_by_css_selector(".azsi2v-2").click()

result_list = driver.find_elements_by_css_selector(".sc-19pxzmk-0")

for item in result_list:
    print(item.find_element_by_css_selector("a").get_attribute("href"))

这给了我:

https://www.blockchain.com/btc/address/1DC6cb6mFcTgJAwFDEB65Qn457BzDxs3Wh
https://www.blockchain.com/btc/address/3JRj8b1cngQ1nJHwVPRXj1NFXRVzhMDFTf
https://www.blockchain.com/btc/address/3PdareoJL1N8t2BQAnKcVqkS9cdQQo6gLY
https://www.blockchain.com/btc/address/3LhFL4QhhSdtwuPBK4rwD2Z7VwndGVeoKR
https://www.blockchain.com/btc/address/3Nyhd9vMKxep6QhquDSea7yPg9TpCAKTEF
https://www.blockchain.com/btc/address/12a5iTzFRJGZ4H3sZV6UZv6GrUTiwyKyR6
https://www.blockchain.com/btc/address/3K4Hh5LDyqdryj7Xd1FBNgheE2aQHee97X
https://www.blockchain.com/btc/address/3CeQRAViNuqXHH3AcjmdnArCEbRRAdyxCm
https://www.blockchain.com/btc/address/1KHfhqk78kaSf5t1eC48pyLuxPHYTDstcK
https://www.blockchain.com/btc/address/1QKfADjViFcwjCjkmwK84oPXVNNRRDY9VK
https://www.blockchain.com/btc/address/bc1qt0pa5a7j5ay5slqxeujjvxs6zyq7l5z0lf97flxge2std02pfdyqkwlhv4

推荐阅读