python-3.x - soup.find(class_="" ) not working and return NoneType, in this case, how to scrape the website
问题描述
I am trying to scrape this website so as to download the names and links of all coins listed on the website: https://www.cryptocompare.com/ico/#/completed
I tried using the beautifulsoup module to locate the div class=coins-list and then the
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from requests_html import HTMLSession
session = HTMLSession()
url= "https://www.cryptocompare.com/ico/#/completed"
r=session.get(url)
soup = BeautifulSoup(r.text,'html.parser')
coin_name_list = soup.find("div",attrs= {"id":"coins-list"})
coin_name_list_items = coin_name_list.find_all('a')
The error message is as follows: ---> 14 coin_name_list_items = coin_name_list.find_all('a')
AttributeError: 'NoneType' object has no attribute 'find_all'
I don't understand why this isn't working, is it sth related to the website or the programming? It would be much appreciated if anyone can find the solution. Thanks!
解决方案
JavaScripts 渲染页面。但是,如果你去 Network Tab,你会得到下面的 api,它以 json 形式返回输出
url="https://min-api.cryptocompare.com/data/subsWatchlist?fsyms=433,ABT,DRG,XTZ,FIL,BNT,SPICE,ALGO,SRN,PMA,SNT,SOLID,WAX,NTK,BKX,DRT,EVN,ONL,OIO,LDC,CEL,BUBO,SVD,DOR,ICX,INS,LAX,GAT,NGC,WPR,OKOIN,HBZ,BQTX,MTH,PAT,BAT,AST,OMX,INVC,C20,POWR,FUEL,REQ,CVC,AGI,AMB,LINK,BRD,PLBT,SYNCO,STORJ,MWAT,LUC,CRE,CMT,BMC,BERRY,redBUX,RAC,MCO,DLT,STX,SFU,TCOIN,RKT,MANA,OAK,SNM,FUN,WRC,BCAT,VRA,CFTY,TNT,PROPS,OMG,ANT,AE,EDO,SETH,WT,TIOX,CMCT,ENJ,EVX,KICK,COV,ZUP,PLR,VEE,BDG,KEY,STT,U42,UTK,AID,AVT,IMPCN,TFLEX,SLX,PLA,ISR,GARD,FLUZ,BAX,RVT,CSP,HOLO,MCAP,THO,VLR,TFD,WBBC,ELY,DBCCOIN,LA,OAX,DAV,ETH,OST,DMT,GLDR,IHT,FLP,CTR,ILCT,BEX,CTLX,ATOM,NAVI,CEDEX,EVC,TPAY,WAVES,WELL,WHN,LYK,eQUAD,HHEM,GENE,APPC,MASP,LAMB,ABYSS,QTUM,PHX,EKO,DTR,CPAY,COFI,CEEK,BIT,AIR,ATB,CFI,HQX,PRO,PIX,NEU,GES,ART,NIM,MYST,ICOS,SLY,ORS,HST,FOAM,DEB,OSA,TMT,GFT,LEV,COB,TKN,HGS,GNO,GLA,DAOC,NPX,BOS,XCD,RLC,VERI,ADB,LCK,LCS,EARTH,MOAT,BNTN,FSBT,SUB,WLK,HOLD,DTH,ELI,NRN,CIX100,PST,DNO,ACAT,ROX,VIB,FIII,ICN,MFG,CJT,BCNX,REAL,CSNO,XYO,SAN,PTOY,MTRC,XRL,TAU,POE,KRM,GBXT,BCNA,BCAP,IMVR,DIM,ESTATE,BON,ARAW,IND,VCN,ADM,EXY,DNT,AMO,CPY,ADX,GC,HAC,PPT,MTL,BITX,CAS,DRGN,PAN,CUZ,PKT,SEAL,SHPING,GNT,HVN,LED,YMZ,LTA,CTX,EBC,BITCAR,GAMB,GMA,STAC,DUSK,GENS,SRXIO,DTN,PBT,KEP,ADT,BBI,MNTP,CAN,TAAS,LWF,DAY,CCOS,DAT,SNGLS,CDT,KNW,AEN,WIZ,BFC,SHP,ZRC,EQUI,TIX,IXT,ERT,MLT,SPX,VTM,PROOF,ECHT,INDI,SCL,UBEX,HBT,TIE,MAID,WGR,LSK,GBE,ATL,ETRNT,ITM,ARR,LOCI,DLA,MIO,FNX,AZ,TFL,GJC,JVY,SCRM,MOLK,DGD,1ST,X8X,GUAR,TIME,AUC,EON,ORBIS,HGT,HMQ,XID,REP,MSP,VIN,STA,DRP,ALX,TAN,GUP,JOY,ACE,VZT,VNS,CAPP,XELS,AIC,MEM,REPUX,AMP,TRW,TRST,ARN,XBB,OXY,SRCH,PPY,ECOB,FUNDP,ETT,HETA,LYM,FLOT,DENT,DCT,QAU,XRM,PLC,SFT,COIN,SRNT,NVST,PINMO,CVCOIN,NEO,ADST,DTX,IPT,DIP,GBA,BTS,PKC,CHT,CHX,LUN,JOYT,GRO,CLIN,SKIN,MNRB,QRL,MOS,MOBU,NRX,COTI,YUM,MYDFS,WINS,H2O,MLN,IAG,VESTA,OSF,GBTC,MYB,TASH,CRON,SUR,T4M,CAP,EDG,ICOO,VANM,PSK,ALTOCAR,PING,ICE,FNP,DNN,ICST,LNT,PHM,UNITY,RCC,VRT,BCCOIN,OCEANT,MT,IFT,COS,LAT,CRL,ILT,SAT,STU,SNOV,PRIX,HMT,ARNA,COR,WINGS,LGD,CYBR,AXIS,KMD,ITT,MTX,OMI,LYQD,GAI,ENCX,DDF,WRL,ETE,XCP,GMR,TOSS,ET4,VSL,ETKN,TKT,SNC,PYP,RYZ,FDX,ATFS,GEA,LNK,TJA,SND,BTRM,IGTT,AMBT,FILL,VOISE,AUDC,REA,AUK,ATTR,GIM,CRS,DNA,NEBL,CTW,BETHER,AUN,DHT,LKK,ETBS,FYN,INCNT,BAY,DTB,TRIP,ADL,ADUX,GUESS,MBI,PLU,DRC,ARK,SRC,EMT,HDG,SRX,KEX,FFCT,KVNT,PART,QVT,DGPT,COT,SREUR,ROCK,KAPU,TUT,ENK,PQT,CWEX,STRAT,RKC,NFN,SWARM,NBX,WAND,TFC,DTRC,SAF,OMNI,MAT,HBX,EMN,EJAC,FNTB,SPEND,FOXT,GOLOS,SJCX,DSLA,ATON,UMC,AVA,PRP,SDAO,NXC,KEN,VIA,TKR,CHK,VEGA,ORI,ZOPT,TWC,RIYA,CREA,GXC,ROCK2,JDC,DAR,DTCT,WEB,PERU,JSE,PGL,HELIOS,OTX,MC,ODMC,GGS,CFT,ABC,VTOS,CCT,FLLW,SGN,SIFT,CRTM,PRM,CO2,BOU,ITR,SNK,CZC,AHT,AGVC,XRX,SYC,3DES,LTCH,CPL,NIMFA,TRAVEL,SCOR,BNR,TKS,UET,XTRA,MNM,XLC,XCJ,PLMT,MEDI,CWX,IWT,PSB,SPORTG,XSPEC,ICOB,SMNX,KRP,NXT,EQ,ROK,AFCT,TRIBE,Z2,DFBT,OXY2,MIOTA,ZYM,ZUC,ZRX,ZNT,ZNAQ,ZNA,ZIX,ZILLA,ZEROB,ZEEW,ZCHN,ZCC1,ZAZA,ZAB,YSH,YPTO,YDY,YBT,YANU,YACHTCO,XUC,XTN,XSB,XRK,XRF,XRED,XRBT,XR,XPX,XPT,XPR,XPL,XOS,XNT,XIM,XGH,XEP,XDT,XCZ,XBX,XBOND,XBANK,XAL,WYR,WXT,WUG,WU,WTXH,WTT,WTL,WSH,WRT,WPT,WPP,WOWX,WORK,WOM,WMK,WMD,WMB,WLME,WIIX,WICC&tsym=USD"
导入 json 以使用该 json 对象。
import requests
import json
headers = {'User-Agent':
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}
url="https://min-api.cryptocompare.com/data/subsWatchlist?fsyms=433,ABT,DRG,XTZ,FIL,BNT,SPICE,ALGO,SRN,PMA,SNT,SOLID,WAX,NTK,BKX,DRT,EVN,ONL,OIO,LDC,CEL,BUBO,SVD,DOR,ICX,INS,LAX,GAT,NGC,WPR,OKOIN,HBZ,BQTX,MTH,PAT,BAT,AST,OMX,INVC,C20,POWR,FUEL,REQ,CVC,AGI,AMB,LINK,BRD,PLBT,SYNCO,STORJ,MWAT,LUC,CRE,CMT,BMC,BERRY,redBUX,RAC,MCO,DLT,STX,SFU,TCOIN,RKT,MANA,OAK,SNM,FUN,WRC,BCAT,VRA,CFTY,TNT,PROPS,OMG,ANT,AE,EDO,SETH,WT,TIOX,CMCT,ENJ,EVX,KICK,COV,ZUP,PLR,VEE,BDG,KEY,STT,U42,UTK,AID,AVT,IMPCN,TFLEX,SLX,PLA,ISR,GARD,FLUZ,BAX,RVT,CSP,HOLO,MCAP,THO,VLR,TFD,WBBC,ELY,DBCCOIN,LA,OAX,DAV,ETH,OST,DMT,GLDR,IHT,FLP,CTR,ILCT,BEX,CTLX,ATOM,NAVI,CEDEX,EVC,TPAY,WAVES,WELL,WHN,LYK,eQUAD,HHEM,GENE,APPC,MASP,LAMB,ABYSS,QTUM,PHX,EKO,DTR,CPAY,COFI,CEEK,BIT,AIR,ATB,CFI,HQX,PRO,PIX,NEU,GES,ART,NIM,MYST,ICOS,SLY,ORS,HST,FOAM,DEB,OSA,TMT,GFT,LEV,COB,TKN,HGS,GNO,GLA,DAOC,NPX,BOS,XCD,RLC,VERI,ADB,LCK,LCS,EARTH,MOAT,BNTN,FSBT,SUB,WLK,HOLD,DTH,ELI,NRN,CIX100,PST,DNO,ACAT,ROX,VIB,FIII,ICN,MFG,CJT,BCNX,REAL,CSNO,XYO,SAN,PTOY,MTRC,XRL,TAU,POE,KRM,GBXT,BCNA,BCAP,IMVR,DIM,ESTATE,BON,ARAW,IND,VCN,ADM,EXY,DNT,AMO,CPY,ADX,GC,HAC,PPT,MTL,BITX,CAS,DRGN,PAN,CUZ,PKT,SEAL,SHPING,GNT,HVN,LED,YMZ,LTA,CTX,EBC,BITCAR,GAMB,GMA,STAC,DUSK,GENS,SRXIO,DTN,PBT,KEP,ADT,BBI,MNTP,CAN,TAAS,LWF,DAY,CCOS,DAT,SNGLS,CDT,KNW,AEN,WIZ,BFC,SHP,ZRC,EQUI,TIX,IXT,ERT,MLT,SPX,VTM,PROOF,ECHT,INDI,SCL,UBEX,HBT,TIE,MAID,WGR,LSK,GBE,ATL,ETRNT,ITM,ARR,LOCI,DLA,MIO,FNX,AZ,TFL,GJC,JVY,SCRM,MOLK,DGD,1ST,X8X,GUAR,TIME,AUC,EON,ORBIS,HGT,HMQ,XID,REP,MSP,VIN,STA,DRP,ALX,TAN,GUP,JOY,ACE,VZT,VNS,CAPP,XELS,AIC,MEM,REPUX,AMP,TRW,TRST,ARN,XBB,OXY,SRCH,PPY,ECOB,FUNDP,ETT,HETA,LYM,FLOT,DENT,DCT,QAU,XRM,PLC,SFT,COIN,SRNT,NVST,PINMO,CVCOIN,NEO,ADST,DTX,IPT,DIP,GBA,BTS,PKC,CHT,CHX,LUN,JOYT,GRO,CLIN,SKIN,MNRB,QRL,MOS,MOBU,NRX,COTI,YUM,MYDFS,WINS,H2O,MLN,IAG,VESTA,OSF,GBTC,MYB,TASH,CRON,SUR,T4M,CAP,EDG,ICOO,VANM,PSK,ALTOCAR,PING,ICE,FNP,DNN,ICST,LNT,PHM,UNITY,RCC,VRT,BCCOIN,OCEANT,MT,IFT,COS,LAT,CRL,ILT,SAT,STU,SNOV,PRIX,HMT,ARNA,COR,WINGS,LGD,CYBR,AXIS,KMD,ITT,MTX,OMI,LYQD,GAI,ENCX,DDF,WRL,ETE,XCP,GMR,TOSS,ET4,VSL,ETKN,TKT,SNC,PYP,RYZ,FDX,ATFS,GEA,LNK,TJA,SND,BTRM,IGTT,AMBT,FILL,VOISE,AUDC,REA,AUK,ATTR,GIM,CRS,DNA,NEBL,CTW,BETHER,AUN,DHT,LKK,ETBS,FYN,INCNT,BAY,DTB,TRIP,ADL,ADUX,GUESS,MBI,PLU,DRC,ARK,SRC,EMT,HDG,SRX,KEX,FFCT,KVNT,PART,QVT,DGPT,COT,SREUR,ROCK,KAPU,TUT,ENK,PQT,CWEX,STRAT,RKC,NFN,SWARM,NBX,WAND,TFC,DTRC,SAF,OMNI,MAT,HBX,EMN,EJAC,FNTB,SPEND,FOXT,GOLOS,SJCX,DSLA,ATON,UMC,AVA,PRP,SDAO,NXC,KEN,VIA,TKR,CHK,VEGA,ORI,ZOPT,TWC,RIYA,CREA,GXC,ROCK2,JDC,DAR,DTCT,WEB,PERU,JSE,PGL,HELIOS,OTX,MC,ODMC,GGS,CFT,ABC,VTOS,CCT,FLLW,SGN,SIFT,CRTM,PRM,CO2,BOU,ITR,SNK,CZC,AHT,AGVC,XRX,SYC,3DES,LTCH,CPL,NIMFA,TRAVEL,SCOR,BNR,TKS,UET,XTRA,MNM,XLC,XCJ,PLMT,MEDI,CWX,IWT,PSB,SPORTG,XSPEC,ICOB,SMNX,KRP,NXT,EQ,ROK,AFCT,TRIBE,Z2,DFBT,OXY2,MIOTA,ZYM,ZUC,ZRX,ZNT,ZNAQ,ZNA,ZIX,ZILLA,ZEROB,ZEEW,ZCHN,ZCC1,ZAZA,ZAB,YSH,YPTO,YDY,YBT,YANU,YACHTCO,XUC,XTN,XSB,XRK,XRF,XRED,XRBT,XR,XPX,XPT,XPR,XPL,XOS,XNT,XIM,XGH,XEP,XDT,XCZ,XBX,XBOND,XBANK,XAL,WYR,WXT,WUG,WU,WTXH,WTT,WTL,WSH,WRT,WPT,WPP,WOWX,WORK,WOM,WMK,WMD,WMB,WLME,WIIX,WICC&tsym=USD"
r=requests.get(url,headers=headers)
data=json.loads(r.text)
print(data)
但是,如果您想使用代码,则可能无法获得所有值,因为页面是由 javascripts 呈现的。使用 selenium 或我上面的解决方案来获得预期的结果。要获取所有链接,只需使用soup.find_all('a')
` 这是代码。
from bs4 import BeautifulSoup
from requests_html import HTMLSession
session = HTMLSession()
url= "https://www.cryptocompare.com/ico/#/completed"
r=session.get(url)
soup = BeautifulSoup(r.text,'html.parser')
coin_name_list_items = soup.find_all('a')
推荐阅读
- javascript - 通过数组解构分配值时首先制作 ...rest 项
- nosql - 如何在 gremlin 中触发匹配查询。使用 java 驱动程序在 cosmos db 中过滤查询不起作用
- javascript - npm 编译错误('_' 未定义)
- sql-server - SQL中面向对象的表布局
- tfs - 在 Team Foundation Server 的集合中将用户添加到项目时出现问题
- sql - 在 SQL 中计算从年份到日期的周期值
- python - 如何向下滚动并单击按钮以在python中连续抓取页面
- pyspark - 如何在 Pyspark 中加载 Kmeans 模型?加载时出现错误
- javascript - 我们如何在特定的时间间隔自动触发量角器脚本
- html - 防止绝对子级扩展父级