python - 如何在 Python Beautifulsoup 中正确提取返回 None 的数据
问题描述
我正在尝试改进从以下位置提取数据的现有片段:https ://bscscan.com/tx/0x1b6f00c8cd99e0daac5718c743ef9a51af40f95feae23bf29960ae1f66a1cff7 ,我不知道如何提取某些字段,因为它返回无。
from bs4 import BeautifulSoup
from urllib import request
from urllib.request import Request, urlopen
req = Request('https://bscscan.com/tx/0x1b6f00c8cd99e0daac5718c743ef9a51af40f95feae23bf29960ae1f66a1cff7', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage, 'html.parser')
val = soup.find('span', class_='u-label u-label--value u-label--secondary text-dark rounded mr-1').text
transfee = soup.find('span', id='ContentPlaceHolder1_spanTxFee').text
fromaddr = soup.find('span', id='spanFromAdd').text
token = soup.find('span', class_='hash-tag text-truncate hash-tag-custom-from tooltip-address').text
print ("From: \t\t ", fromaddr)
print ("Value: \t\t ", val)
print ("Transaction Fee: ", transfee)
print ("Tokens: ")
main_data=soup.find_all("ul", class_="list-unstyled mb-0")
for i in main_data:
print ("%s" % i.find_all("a")[-1].get_text() + " %s" % "https://bscscan.com/token/"+i.find_all("a")[-1]['href'])
电流输出:
From: 0x6bdfe0696aa4f81245325c7931c117f15459e07a
Value: 0.679753633258727619 BNB
Transaction Fee: 0.00059691 BNB ($0.18)
Tokens:
Binance: WBNB Token https://bscscan.com/token//address/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
FaraCrystal (FARA) https://bscscan.com/token//token/0xf4ed363144981d3a65f42e7d0dc54ff9eef559a1
需要改进:#- 标记下方的附加数据(数值)
From: 0x6bdfe0696aa4f81245325c7931c117f15459e07a
Value: 0.679753633258727619 BNB
Transaction Fee: 0.00059691 BNB ($0.18)
Tokens:
0.679753633258727619 ($200.28) Binance: WBNB Token https://bscscan.com/token//address/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
95.834051318695064337 ($198.62) FaraCrystal (FARA) https://bscscan.com/token//token/0xf4ed363144981d3a65f42e7d0dc54ff9eef559a1
解决方案
您可以简单地使用select_one
方法在循环中选择 css 选择器,以便提取值
from bs4 import BeautifulSoup
from urllib import request
from urllib.request import Request, urlopen
req = Request('https://bscscan.com/tx/0x1b6f00c8cd99e0daac5718c743ef9a51af40f95feae23bf29960ae1f66a1cff7', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage, 'html.parser')
main_data=soup.select("div.row > div.col-md-9 >ul.list-unstyled.mb-0")[1]
for i in main_data:
print(i.find_all("a")[-1].get_text())
print("https://bscscan.com/token/"+i.find_all("a")[-1]['href'])
print(i.select_one("span.mr-1 > span").get_text())
输出:
Wrapped BNB (WBNB)
https://bscscan.com/token//token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
0.679753633258727619 ($200.83)
FaraCrystal (FARA)
https://bscscan.com/token//token/0xf4ed363144981d3a65f42e7d0dc54ff9eef559a1
95.834051318695064337 ($198.34)
或对于价格:其中数据位于跨度标签内,因此您可以使用select
带有 css 选择器的方法
price=main_data.select("span.mr-1 > span")
for p in price:
print(p.get_text())
输出:
0.679753633258727619 ($200.83)
95.834051318695064337 ($198.34)
推荐阅读
- discord.js - 嘿,如何向我的不和谐机器人添加切换系统?[discord.js]
- flutter - 没有名为“nullOk”的命名参数。上下文!=空?Localizations.localeOf(context, nullOk: true) : null,
- java - 如何使用 SwissEPH 获取日月经度?
- spring-boot - 删除十进制 Java 变量
- javascript - 在 JavaScript 中使用 Promise 和 Async/Await
- python - 在 Python 中,从函数内部更改函数外部对象的操作原理是什么?
- html - 如何在我的 ASP.NET Core MVC 项目中使用 Selected2 搜索栏?
- javascript - 如何根据屏幕大小有条件地运行 JavaScipt?
- python - 组织和管理 Django 电子邮件模板的最佳实践
- python - 分解多个 pandas 列并取消嵌套一列作为列名