首页 > 解决方案 > 我有几个关于 python scraper 的问题

问题描述

我正在尝试制作从 bscscan.com 获取数据并在终端中显示的 python 脚本。

想法:当我输入代币的合约地址时,我想获取该合约地址的详细信息,代码如下。

我的第一个问题是如何获得正确的数据mcapamcapa显示错误信息,例如 $121,048,而不是 $121,048,400.00

第二个问题totalbox显示“121,000,000 BUSD(CSupply:95,562,821.092145)”并且正确,但我只想得到“121,000,000 BUSD”

我不明白如何做到这一点:/我尝试了剥离和拆分,但我没有得到我想要的信息

提前感谢

import requests
from bs4 import BeautifulSoup

#print holders
cotractpage = requests.get("https://bscscan.com/token/0xe9e7cea3dedca5984780bafc599bd69add087d56")
soupa = BeautifulSoup(cotractpage.content, 'html.parser')
tokenholders = soupa.find(id='ContentPlaceHolder1_tr_tokenHolders').get_text()
tokenholdersa = "Holders: " + ((((tokenholders.strip()).strip("Holders:")).strip()).strip(" a ")).strip()
print(tokenholdersa)

#print decimal
decimal = soupa.find(id='ContentPlaceHolder1_trDecimals').get_text()
decimala = "Decimal: " + ((((decimal.strip()).strip("Decimals:")).strip()).strip()).strip()
print(decimala)

#print website
website = soupa.find(id='ContentPlaceHolder1_tr_officialsite_1').get_text()
websitea = "Website: " + ((((website.strip()).strip(" Official Site:")).strip()).strip()).strip()
print(websitea)

#print name
website = soupa.find('span', class_='text-secondary small').get_text()
tokename = "Name: " + website
print(tokename)

#printprice
price = soupa.find(id='ContentPlaceHolder1_tr_valuepertoken')
pricebox = price.find('span', class_='d-block').get_text()
print("Price: " + (pricebox).strip())

#print marketcap
mcap = soupa.find(id='ContentPlaceHolder1_tr_valuepertoken').get_text()
mcapa = ((((mcap.strip()).strip("Price")).strip(pricebox)).strip("Market Cap")).strip()
print("Market Cap: " + mcapa)

#print totalsupply
totalbox = soupa.find('div', class_='col-md-8').get_text()
print("Total Supply: " + totalbox)

标签: pythonpython-3.xbeautifulsoup

解决方案


您的脚本几乎是正确的,只是稍作调整:

import re
import requests
from bs4 import BeautifulSoup


#print holders
cotractpage = requests.get("https://bscscan.com/token/0xe9e7cea3dedca5984780bafc599bd69add087d56")
soupa = BeautifulSoup(cotractpage.content, 'html.parser')
tokenholders = soupa.find(id='ContentPlaceHolder1_tr_tokenHolders').get_text()
tokenholdersa = "Holders: " + ((((tokenholders.strip()).strip("Holders:")).strip()).strip(" a ")).strip()
print(tokenholdersa)

#print decimal
decimal = soupa.find(id='ContentPlaceHolder1_trDecimals').get_text()
decimala = "Decimal: " + ((((decimal.strip()).strip("Decimals:")).strip()).strip()).strip()
print(decimala)

#print website
website = soupa.find(id='ContentPlaceHolder1_tr_officialsite_1').get_text()
websitea = "Website: " + ((((website.strip()).strip(" Official Site:")).strip()).strip()).strip()
print(websitea)

#print name
website = soupa.find('span', class_='text-secondary small').get_text()
tokename = "Name: " + website
print(tokename)

#printprice
price = soupa.find(id='ContentPlaceHolder1_tr_valuepertoken')
pricebox = price.find('span', class_='d-block').get_text()
print("Price: " + (pricebox).strip())

#print marketcap
mcapa = soupa.find(id='ContentPlaceHolder1_tr_valuepertoken').get_text()
print("Market Cap: " + re.search(r'Market Cap.*?([$\d,.]+)', mcapa, flags=re.S).group(1))   # <--- we want only amount after "Market Cap"

#print totalsupply
totalbox = soupa.find('div', class_='col-md-8').get_text().split('(')[0].strip()   # <--- we want only first part of the string
print("Total Supply: " + totalbox)

印刷:

Holders: 8,040
Decimal: 18
Website: https://www.paxos.com/busd/
Name: Binance-Peg BUSD Token
Price: $1.0004 @ 0.032597 BNB (+0.02%)
Market Cap: $121,048,400.00
Total Supply: 121,000,000 BUSD

或者,如果您不想将正则表达式用于mcap

#print marketcap
mcapa = soupa.find(id='ContentPlaceHolder1_tr_valuepertoken').find(id='pricebutton').get_text(strip=True)
print("Market Cap: " + mcapa)

推荐阅读