首页 > 解决方案 > soup.find(class_="" ) not working and return NoneType, in this case, how to scrape the website

问题描述

I am trying to scrape this website so as to download the names and links of all coins listed on the website: https://www.cryptocompare.com/ico/#/completed

I tried using the beautifulsoup module to locate the div class=coins-list and then the

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from requests_html import HTMLSession

session = HTMLSession()
url= "https://www.cryptocompare.com/ico/#/completed"
r=session.get(url)
soup = BeautifulSoup(r.text,'html.parser')

coin_name_list = soup.find("div",attrs= {"id":"coins-list"})
coin_name_list_items = coin_name_list.find_all('a')

The error message is as follows: ---> 14 coin_name_list_items = coin_name_list.find_all('a')

AttributeError: 'NoneType' object has no attribute 'find_all'

I don't understand why this isn't working, is it sth related to the website or the programming? It would be much appreciated if anyone can find the solution. Thanks!

标签: python-3.xweb-scrapingbeautifulsoup

解决方案


JavaScripts 渲染页面。但是,如果你去 Network Tab,你会得到下面的 api,它以 json 形式返回输出

url="https://min-api.cryptocompare.com/data/subsWatchlist?fsyms=433,ABT,DRG,XTZ,FIL,BNT,SPICE,ALGO,SRN,PMA,SNT,SOLID,WAX,NTK,BKX,DRT,EVN,ONL,OIO,LDC,CEL,BUBO,SVD,DOR,ICX,INS,LAX,GAT,NGC,WPR,OKOIN,HBZ,BQTX,MTH,PAT,BAT,AST,OMX,INVC,C20,POWR,FUEL,REQ,CVC,AGI,AMB,LINK,BRD,PLBT,SYNCO,STORJ,MWAT,LUC,CRE,CMT,BMC,BERRY,redBUX,RAC,MCO,DLT,STX,SFU,TCOIN,RKT,MANA,OAK,SNM,FUN,WRC,BCAT,VRA,CFTY,TNT,PROPS,OMG,ANT,AE,EDO,SETH,WT,TIOX,CMCT,ENJ,EVX,KICK,COV,ZUP,PLR,VEE,BDG,KEY,STT,U42,UTK,AID,AVT,IMPCN,TFLEX,SLX,PLA,ISR,GARD,FLUZ,BAX,RVT,CSP,HOLO,MCAP,THO,VLR,TFD,WBBC,ELY,DBCCOIN,LA,OAX,DAV,ETH,OST,DMT,GLDR,IHT,FLP,CTR,ILCT,BEX,CTLX,ATOM,NAVI,CEDEX,EVC,TPAY,WAVES,WELL,WHN,LYK,eQUAD,HHEM,GENE,APPC,MASP,LAMB,ABYSS,QTUM,PHX,EKO,DTR,CPAY,COFI,CEEK,BIT,AIR,ATB,CFI,HQX,PRO,PIX,NEU,GES,ART,NIM,MYST,ICOS,SLY,ORS,HST,FOAM,DEB,OSA,TMT,GFT,LEV,COB,TKN,HGS,GNO,GLA,DAOC,NPX,BOS,XCD,RLC,VERI,ADB,LCK,LCS,EARTH,MOAT,BNTN,FSBT,SUB,WLK,HOLD,DTH,ELI,NRN,CIX100,PST,DNO,ACAT,ROX,VIB,FIII,ICN,MFG,CJT,BCNX,REAL,CSNO,XYO,SAN,PTOY,MTRC,XRL,TAU,POE,KRM,GBXT,BCNA,BCAP,IMVR,DIM,ESTATE,BON,ARAW,IND,VCN,ADM,EXY,DNT,AMO,CPY,ADX,GC,HAC,PPT,MTL,BITX,CAS,DRGN,PAN,CUZ,PKT,SEAL,SHPING,GNT,HVN,LED,YMZ,LTA,CTX,EBC,BITCAR,GAMB,GMA,STAC,DUSK,GENS,SRXIO,DTN,PBT,KEP,ADT,BBI,MNTP,CAN,TAAS,LWF,DAY,CCOS,DAT,SNGLS,CDT,KNW,AEN,WIZ,BFC,SHP,ZRC,EQUI,TIX,IXT,ERT,MLT,SPX,VTM,PROOF,ECHT,INDI,SCL,UBEX,HBT,TIE,MAID,WGR,LSK,GBE,ATL,ETRNT,ITM,ARR,LOCI,DLA,MIO,FNX,AZ,TFL,GJC,JVY,SCRM,MOLK,DGD,1ST,X8X,GUAR,TIME,AUC,EON,ORBIS,HGT,HMQ,XID,REP,MSP,VIN,STA,DRP,ALX,TAN,GUP,JOY,ACE,VZT,VNS,CAPP,XELS,AIC,MEM,REPUX,AMP,TRW,TRST,ARN,XBB,OXY,SRCH,PPY,ECOB,FUNDP,ETT,HETA,LYM,FLOT,DENT,DCT,QAU,XRM,PLC,SFT,COIN,SRNT,NVST,PINMO,CVCOIN,NEO,ADST,DTX,IPT,DIP,GBA,BTS,PKC,CHT,CHX,LUN,JOYT,GRO,CLIN,SKIN,MNRB,QRL,MOS,MOBU,NRX,COTI,YUM,MYDFS,WINS,H2O,MLN,IAG,VESTA,OSF,GBTC,MYB,TASH,CRON,SUR,T4M,CAP,EDG,ICOO,VANM,PSK,ALTOCAR,PING,ICE,FNP,DNN,ICST,LNT,PHM,UNITY,RCC,VRT,BCCOIN,OCEANT,MT,IFT,COS,LAT,CRL,ILT,SAT,STU,SNOV,PRIX,HMT,ARNA,COR,WINGS,LGD,CYBR,AXIS,KMD,ITT,MTX,OMI,LYQD,GAI,ENCX,DDF,WRL,ETE,XCP,GMR,TOSS,ET4,VSL,ETKN,TKT,SNC,PYP,RYZ,FDX,ATFS,GEA,LNK,TJA,SND,BTRM,IGTT,AMBT,FILL,VOISE,AUDC,REA,AUK,ATTR,GIM,CRS,DNA,NEBL,CTW,BETHER,AUN,DHT,LKK,ETBS,FYN,INCNT,BAY,DTB,TRIP,ADL,ADUX,GUESS,MBI,PLU,DRC,ARK,SRC,EMT,HDG,SRX,KEX,FFCT,KVNT,PART,QVT,DGPT,COT,SREUR,ROCK,KAPU,TUT,ENK,PQT,CWEX,STRAT,RKC,NFN,SWARM,NBX,WAND,TFC,DTRC,SAF,OMNI,MAT,HBX,EMN,EJAC,FNTB,SPEND,FOXT,GOLOS,SJCX,DSLA,ATON,UMC,AVA,PRP,SDAO,NXC,KEN,VIA,TKR,CHK,VEGA,ORI,ZOPT,TWC,RIYA,CREA,GXC,ROCK2,JDC,DAR,DTCT,WEB,PERU,JSE,PGL,HELIOS,OTX,MC,ODMC,GGS,CFT,ABC,VTOS,CCT,FLLW,SGN,SIFT,CRTM,PRM,CO2,BOU,ITR,SNK,CZC,AHT,AGVC,XRX,SYC,3DES,LTCH,CPL,NIMFA,TRAVEL,SCOR,BNR,TKS,UET,XTRA,MNM,XLC,XCJ,PLMT,MEDI,CWX,IWT,PSB,SPORTG,XSPEC,ICOB,SMNX,KRP,NXT,EQ,ROK,AFCT,TRIBE,Z2,DFBT,OXY2,MIOTA,ZYM,ZUC,ZRX,ZNT,ZNAQ,ZNA,ZIX,ZILLA,ZEROB,ZEEW,ZCHN,ZCC1,ZAZA,ZAB,YSH,YPTO,YDY,YBT,YANU,YACHTCO,XUC,XTN,XSB,XRK,XRF,XRED,XRBT,XR,XPX,XPT,XPR,XPL,XOS,XNT,XIM,XGH,XEP,XDT,XCZ,XBX,XBOND,XBANK,XAL,WYR,WXT,WUG,WU,WTXH,WTT,WTL,WSH,WRT,WPT,WPP,WOWX,WORK,WOM,WMK,WMD,WMB,WLME,WIIX,WICC&tsym=USD"

导入 json 以使用该 json 对象。

import requests
import json
headers = {'User-Agent':
       'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}
url="https://min-api.cryptocompare.com/data/subsWatchlist?fsyms=433,ABT,DRG,XTZ,FIL,BNT,SPICE,ALGO,SRN,PMA,SNT,SOLID,WAX,NTK,BKX,DRT,EVN,ONL,OIO,LDC,CEL,BUBO,SVD,DOR,ICX,INS,LAX,GAT,NGC,WPR,OKOIN,HBZ,BQTX,MTH,PAT,BAT,AST,OMX,INVC,C20,POWR,FUEL,REQ,CVC,AGI,AMB,LINK,BRD,PLBT,SYNCO,STORJ,MWAT,LUC,CRE,CMT,BMC,BERRY,redBUX,RAC,MCO,DLT,STX,SFU,TCOIN,RKT,MANA,OAK,SNM,FUN,WRC,BCAT,VRA,CFTY,TNT,PROPS,OMG,ANT,AE,EDO,SETH,WT,TIOX,CMCT,ENJ,EVX,KICK,COV,ZUP,PLR,VEE,BDG,KEY,STT,U42,UTK,AID,AVT,IMPCN,TFLEX,SLX,PLA,ISR,GARD,FLUZ,BAX,RVT,CSP,HOLO,MCAP,THO,VLR,TFD,WBBC,ELY,DBCCOIN,LA,OAX,DAV,ETH,OST,DMT,GLDR,IHT,FLP,CTR,ILCT,BEX,CTLX,ATOM,NAVI,CEDEX,EVC,TPAY,WAVES,WELL,WHN,LYK,eQUAD,HHEM,GENE,APPC,MASP,LAMB,ABYSS,QTUM,PHX,EKO,DTR,CPAY,COFI,CEEK,BIT,AIR,ATB,CFI,HQX,PRO,PIX,NEU,GES,ART,NIM,MYST,ICOS,SLY,ORS,HST,FOAM,DEB,OSA,TMT,GFT,LEV,COB,TKN,HGS,GNO,GLA,DAOC,NPX,BOS,XCD,RLC,VERI,ADB,LCK,LCS,EARTH,MOAT,BNTN,FSBT,SUB,WLK,HOLD,DTH,ELI,NRN,CIX100,PST,DNO,ACAT,ROX,VIB,FIII,ICN,MFG,CJT,BCNX,REAL,CSNO,XYO,SAN,PTOY,MTRC,XRL,TAU,POE,KRM,GBXT,BCNA,BCAP,IMVR,DIM,ESTATE,BON,ARAW,IND,VCN,ADM,EXY,DNT,AMO,CPY,ADX,GC,HAC,PPT,MTL,BITX,CAS,DRGN,PAN,CUZ,PKT,SEAL,SHPING,GNT,HVN,LED,YMZ,LTA,CTX,EBC,BITCAR,GAMB,GMA,STAC,DUSK,GENS,SRXIO,DTN,PBT,KEP,ADT,BBI,MNTP,CAN,TAAS,LWF,DAY,CCOS,DAT,SNGLS,CDT,KNW,AEN,WIZ,BFC,SHP,ZRC,EQUI,TIX,IXT,ERT,MLT,SPX,VTM,PROOF,ECHT,INDI,SCL,UBEX,HBT,TIE,MAID,WGR,LSK,GBE,ATL,ETRNT,ITM,ARR,LOCI,DLA,MIO,FNX,AZ,TFL,GJC,JVY,SCRM,MOLK,DGD,1ST,X8X,GUAR,TIME,AUC,EON,ORBIS,HGT,HMQ,XID,REP,MSP,VIN,STA,DRP,ALX,TAN,GUP,JOY,ACE,VZT,VNS,CAPP,XELS,AIC,MEM,REPUX,AMP,TRW,TRST,ARN,XBB,OXY,SRCH,PPY,ECOB,FUNDP,ETT,HETA,LYM,FLOT,DENT,DCT,QAU,XRM,PLC,SFT,COIN,SRNT,NVST,PINMO,CVCOIN,NEO,ADST,DTX,IPT,DIP,GBA,BTS,PKC,CHT,CHX,LUN,JOYT,GRO,CLIN,SKIN,MNRB,QRL,MOS,MOBU,NRX,COTI,YUM,MYDFS,WINS,H2O,MLN,IAG,VESTA,OSF,GBTC,MYB,TASH,CRON,SUR,T4M,CAP,EDG,ICOO,VANM,PSK,ALTOCAR,PING,ICE,FNP,DNN,ICST,LNT,PHM,UNITY,RCC,VRT,BCCOIN,OCEANT,MT,IFT,COS,LAT,CRL,ILT,SAT,STU,SNOV,PRIX,HMT,ARNA,COR,WINGS,LGD,CYBR,AXIS,KMD,ITT,MTX,OMI,LYQD,GAI,ENCX,DDF,WRL,ETE,XCP,GMR,TOSS,ET4,VSL,ETKN,TKT,SNC,PYP,RYZ,FDX,ATFS,GEA,LNK,TJA,SND,BTRM,IGTT,AMBT,FILL,VOISE,AUDC,REA,AUK,ATTR,GIM,CRS,DNA,NEBL,CTW,BETHER,AUN,DHT,LKK,ETBS,FYN,INCNT,BAY,DTB,TRIP,ADL,ADUX,GUESS,MBI,PLU,DRC,ARK,SRC,EMT,HDG,SRX,KEX,FFCT,KVNT,PART,QVT,DGPT,COT,SREUR,ROCK,KAPU,TUT,ENK,PQT,CWEX,STRAT,RKC,NFN,SWARM,NBX,WAND,TFC,DTRC,SAF,OMNI,MAT,HBX,EMN,EJAC,FNTB,SPEND,FOXT,GOLOS,SJCX,DSLA,ATON,UMC,AVA,PRP,SDAO,NXC,KEN,VIA,TKR,CHK,VEGA,ORI,ZOPT,TWC,RIYA,CREA,GXC,ROCK2,JDC,DAR,DTCT,WEB,PERU,JSE,PGL,HELIOS,OTX,MC,ODMC,GGS,CFT,ABC,VTOS,CCT,FLLW,SGN,SIFT,CRTM,PRM,CO2,BOU,ITR,SNK,CZC,AHT,AGVC,XRX,SYC,3DES,LTCH,CPL,NIMFA,TRAVEL,SCOR,BNR,TKS,UET,XTRA,MNM,XLC,XCJ,PLMT,MEDI,CWX,IWT,PSB,SPORTG,XSPEC,ICOB,SMNX,KRP,NXT,EQ,ROK,AFCT,TRIBE,Z2,DFBT,OXY2,MIOTA,ZYM,ZUC,ZRX,ZNT,ZNAQ,ZNA,ZIX,ZILLA,ZEROB,ZEEW,ZCHN,ZCC1,ZAZA,ZAB,YSH,YPTO,YDY,YBT,YANU,YACHTCO,XUC,XTN,XSB,XRK,XRF,XRED,XRBT,XR,XPX,XPT,XPR,XPL,XOS,XNT,XIM,XGH,XEP,XDT,XCZ,XBX,XBOND,XBANK,XAL,WYR,WXT,WUG,WU,WTXH,WTT,WTL,WSH,WRT,WPT,WPP,WOWX,WORK,WOM,WMK,WMD,WMB,WLME,WIIX,WICC&tsym=USD"
r=requests.get(url,headers=headers)
data=json.loads(r.text)
print(data)

但是,如果您想使用代码,则可能无法获得所有值,因为页面是由 javascripts 呈现的。使用 selenium 或我上面的解决方案来获得预期的结果。要获取所有链接,只需使用soup.find_all('a') ` 这是代码。

from bs4 import BeautifulSoup
from requests_html import HTMLSession

session = HTMLSession()
url= "https://www.cryptocompare.com/ico/#/completed"
r=session.get(url)
soup = BeautifulSoup(r.text,'html.parser')

coin_name_list_items = soup.find_all('a')

推荐阅读