首页 > 解决方案 > beautifulsoup 不允许我使用 find_all() 命令

问题描述

HTML 源代码我正在开发一个独立项目,我想从加密货币中抓取所有历史数据并存储在 python pandas df 中。我已经确定了html页面的结构,并且有以下代码

from bs4 import BeautifulSoup
import urllib3
import requests
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


bitcoin_df = pd.DataFrame(columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Market Cap'])

bitcoin_url = "https://coinmarketcap.com/currencies/bitcoin/historical-data/"
bitcoin_content = requests.get(bitcoin_url).text
bitcoin_soup = BeautifulSoup(bitcoin_content, "lxml")
#print(bitcoin_soup.prettify())

bitcoin_table = bitcoin_soup.find("table", attrs={"class": "h7vnx2-2 hLKazY cmc-table  "})
bitcoin_table_data = bitcoin_table.find_all("tr")

for tr in bitcoin_table_data:
    tds = tr.find_all("td")
    for td in tds:
        bitcoin_df.append({'Date': td[0].text, 'Open': td[1].text, 'High': td[2].text, 'Low': td[3].text, 'Close': td[4].text, 'Volume': td[5].text, 'Market Cap': td[6].text})

但是,我遇到了这个错误:

>AttributeError                            Traceback (most recent call last)
<ipython-input-46-316341b6771b> in <module>
      7 
      8 bitcoin_table = bitcoin_soup.find("table", attrs={"class": "h7vnx2-2 hLKazY cmc-table  "})
----> 9 bitcoin_table_data = bitcoin_table.find_all("tr")
     10 
     11 #for tr in bitcoin_soup.find_all('tr'):
>AttributeError: 'NoneType' object has no attribute 'find_all'

标签: pythonweb-scrapingbeautifulsoup

解决方案


您收到该错误是因为被.find()调用返回None以指示它无法找到该表。该表是由浏览器内的 Javascript 创建的,因此不会出现。

与其尝试解析 HTML,不如直接从他们的 API 请求数据(就像浏览器一样)。例如:

import pandas as pd
import requests
import time

ts = int(time.time())
json_url = f"https://api.coinmarketcap.com/data-api/v3/cryptocurrency/historical?id=1&convertId=2781&timeStart={ts - 5270400}&timeEnd={ts}"
json_req = requests.get(json_url)
json_data = json_req.json()
                                                            
data = []

for quote in json_data['data']['quotes']:
    data.append([
        quote['quote']['timestamp'],
        quote['quote']['open'],
        quote['quote']['high'],
        quote['quote']['low'],
        quote['quote']['close'],
        quote['quote']['volume'],
        quote['quote']['marketCap'],
    ])
    
df = pd.DataFrame(data, columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Market Cap'])
print(df)

这会给你一个数据框开始:

                        Date          Open          High           Low         Close        Volume    Market Cap
0   2021-09-13T23:59:59.999Z  46057.215327  46598.678985  43591.320785  44963.072633  4.096994e+10  8.459805e+11
1   2021-09-14T23:59:59.999Z  44960.049359  47218.125355  44752.331349  47092.493833  3.865215e+10  8.860953e+11
2   2021-09-15T23:59:59.999Z  47097.998123  48450.468466  46773.326543  48176.346393  3.048450e+10  9.065325e+11

这个 URL 是通过观察浏览器使用自己的开发工具请求数据而找到的。我建议你print(json_data)看看返回了什么。


推荐阅读