首页 > 解决方案 > 如何从 URL 中的第二个表中抓取数据?

问题描述

抱歉,如果这是一个超级愚蠢的问题,但我确实尝试了一些事情,我的尝试显示在下面。

from bs4 import BeautifulSoup
import requests

for page in range(1,5):
    r=requests.get('https://etfdb.com/screener/#tab=returns&page=' + page)
    data = r.text
    soup = BeautifulSoup(data, "html.parser")

    table = soup.find("table", {"class":"table table-bordered table-hover table-striped mm-mobile-table"})

    A=[]
    B=[]
    C=[]
    D=[]
    E=[]
    F=[]
    G=[]
    H=[]

    for row in table.findAll("tr"):
        for cell in row("td"):
            #print (cell.get_text().strip())
            A.append(cell[0].get_text().strip())
            B.append(cell[1].get_text().strip())
            C.append(cell[2].get_text().strip())
            D.append(cell[3].get_text().strip())
            E.append(cell[4].get_text().strip())
            F.append(cell[5].get_text().strip())
            G.append(cell[6].get_text().strip())
            H.append(cell[7].get_text().strip())

df=pd.DataFrame(A,columns=['Symbol'])
df['ETF_Name']=B
df['1_Week']=C
df['4_Week']=D
df['YTD']=E
df['1_Year']=F
df['3_Year']=G
df['5_Year']=H
df

我相信相关表格的名称是“table table-bordered table-hover table-striped mm-mobile-table”。问题是,似乎有多个同名的表,我的代码是从第一个表中获取数据,但我想要另一个表中的数据,我认为这是第二个表。我要从中下载数据的表如下所示(“返回”而不是“概述”)。

在此处输入图像描述

标签: pythonpython-3.xbeautifulsoup

解决方案


数据通过 JavaScript 动态加载。您可以使用requests模块来加载数据,例如:

import json
import requests
from bs4 import BeautifulSoup


url = 'https://etfdb.com/api/screener/'
json_data = {"tab":"returns","page":1,"only":["meta","data",None]}
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0', 'Accept':'application/json'}

for page in range(1, 5):  # <-- increase this to desired number of pages

    json_data['page'] = page
    data = requests.post(url, json=json_data, headers=headers).json()

    # uncomment this to print all data:
    # print(json.dumps(data, indent=4))

    # print some data to screen:
    for d in data['data']:
        print('{:<5}{:<50}{:>9}{:>9}{:>9}'.format(d['symbol']['text'], d['name']['text'], d['ytd'], d['one_week_return'], d['four_week_return']))

印刷:

SPY  SPDR S&P 500 ETF                                     -2.19%   -2.44%    7.19%
IVV  iShares Core S&P 500 ETF                             -2.22%   -2.46%    7.20%
VTI  Vanguard Total Stock Market ETF                      -2.58%   -2.45%    7.88%
VOO  Vanguard S&P 500 ETF                                 -2.28%   -2.49%    7.15%
QQQ  Invesco QQQ                                          14.47%   -0.18%    7.05%
AGG  iShares Core U.S. Aggregate Bond ETF                  5.85%    0.48%    0.82%
VEA  Vanguard FTSE Developed Markets ETF                 -10.34%   -2.46%    9.68%
IEFA iShares Core MSCI EAFE ETF                          -10.36%   -2.37%   10.05%
GLD  SPDR Gold Trust                                      13.54%    0.61%   -1.22%
VUG  Vanguard Growth ETF                                  10.29%   -0.73%    7.44%
VWO  Vanguard FTSE Emerging Markets ETF                  -11.31%   -2.48%    7.16%
BND  Vanguard Total Bond Market ETF                        5.84%    0.42%    0.82%
IWF  iShares Russell 1000 Growth ETF                       8.62%   -0.90%    6.99%

... etc.

推荐阅读