首页 > 解决方案 > 如何在 Python Beautifulsoup 中正确提取返回 None 的数据

问题描述

我正在尝试改进从以下位置提取数据的现有片段:https ://bscscan.com/tx/0x1b6f00c8cd99e0daac5718c743ef9a51af40f95feae23bf29960ae1f66a1cff7 ,我不知道如何提取某些字段,因为它返回无。

from bs4 import BeautifulSoup
from urllib import request
from urllib.request import Request, urlopen

req = Request('https://bscscan.com/tx/0x1b6f00c8cd99e0daac5718c743ef9a51af40f95feae23bf29960ae1f66a1cff7', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage, 'html.parser')

val = soup.find('span', class_='u-label u-label--value u-label--secondary text-dark rounded mr-1').text
transfee = soup.find('span', id='ContentPlaceHolder1_spanTxFee').text
fromaddr = soup.find('span', id='spanFromAdd').text
token = soup.find('span', class_='hash-tag text-truncate hash-tag-custom-from tooltip-address').text

print ("From: \t\t ", fromaddr)
print ("Value: \t\t ", val)
print ("Transaction Fee: ", transfee)
print ("Tokens: ")

main_data=soup.find_all("ul", class_="list-unstyled mb-0")
for i in main_data:
    print ("%s" % i.find_all("a")[-1].get_text() + " %s" % "https://bscscan.com/token/"+i.find_all("a")[-1]['href'])

电流输出:

From:             0x6bdfe0696aa4f81245325c7931c117f15459e07a
Value:            0.679753633258727619 BNB
Transaction Fee:  0.00059691 BNB  ($0.18) 
Tokens:           
   Binance: WBNB Token https://bscscan.com/token//address/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
   FaraCrystal (FARA)  https://bscscan.com/token//token/0xf4ed363144981d3a65f42e7d0dc54ff9eef559a1

需要改进:#- 标记下方的附加数据(数值)

From:         0x6bdfe0696aa4f81245325c7931c117f15459e07a
Value:        0.679753633258727619 BNB
Transaction Fee:  0.00059691 BNB  ($0.18) 
Tokens: 
    0.679753633258727619 ($200.28)  Binance: WBNB Token   https://bscscan.com/token//address/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
   95.834051318695064337 ($198.62)  FaraCrystal (FARA)    https://bscscan.com/token//token/0xf4ed363144981d3a65f42e7d0dc54ff9eef559a1

标签: pythonpython-3.xbeautifulsouppython-requests

解决方案


您可以简单地使用select_one 方法在循环中选择 css 选择器,以便提取值

from bs4 import BeautifulSoup
from urllib import request
from urllib.request import Request, urlopen

req = Request('https://bscscan.com/tx/0x1b6f00c8cd99e0daac5718c743ef9a51af40f95feae23bf29960ae1f66a1cff7', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage, 'html.parser')

main_data=soup.select("div.row > div.col-md-9 >ul.list-unstyled.mb-0")[1]
for i in main_data:
    print(i.find_all("a")[-1].get_text())
    print("https://bscscan.com/token/"+i.find_all("a")[-1]['href'])
    print(i.select_one("span.mr-1 > span").get_text())

输出:

Wrapped BNB (WBNB)
https://bscscan.com/token//token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
0.679753633258727619 ($200.83)
FaraCrystal (FARA) 
https://bscscan.com/token//token/0xf4ed363144981d3a65f42e7d0dc54ff9eef559a1
95.834051318695064337 ($198.34)

或对于价格:其中数据位于跨度标签内,因此您可以使用select带有 css 选择器的方法

price=main_data.select("span.mr-1 > span")
for p in price:
    print(p.get_text())

输出:

0.679753633258727619 ($200.83)
95.834051318695064337 ($198.34)

推荐阅读