首页 > 解决方案 > 如何使用 BeautifulSoup 从 Python 网站中未加载的选项卡中抓取表数据

问题描述

我正在尝试从这个网站上抓取索引数据。我正在尝试从索引选项卡中抓取翻转数据,但是当我抓取表格时,其内容显示如下:

<table cellspacing="0" class="derivatives_section table table-striped responsive dt-responsive nowrap derivatives_rollover_tbl" id="rollover_index_table" width="100%">
<thead>
<tr>
<th>Index</th>
<th>Future<br/> Price</th>
<th>% Price<br/> Chg.</th>
<th>% OI<br/> Chg.</th>
<th>No. of Shares<br/> Rolled</th>
<th>% Rollover</th>
<th id="ro_idx_1">% Chg Rollover <br/> Vs. 1 Month Avg.</th>
<th>% Rollover <br/>Cost </th>
<th id="ro_idx_2">% Chg Rollover Cost <br/> Vs. 1 Month Avg.</th>
</tr>
</thead>
<tbody>
<tr>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
</tr>
<tr>

以下是产生与上述相同结果的代码:

import requests
import json
import time
from bs4 import BeautifulSoup

url = 'https://www.indiainfoline.com/markets/derivatives/rollover#derivatives_index'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'}

request = requests.get(url,headers=headers)
soup = BeautifulSoup(request.text,'html.parser')

table = soup.find('table',{'id':'rollover_index_table'})
tbody = table.find('tbody')
tr = tbody.find('tr')
td = tr.find_all('td')

print(td)

如何抓取网站的索引选项卡数据?

标签: pythonweb-scrapingbeautifulsouppython-requests

解决方案


数据来自返回 json 的 API 调用。您可以按如下方式创建数据的数据框:

import requests
import pandas as pd

r = requests.get('https://www.indiainfoline.com/api/papi-call-api.php?url=/Derivative/Derivative.svc/FNO-Rollover/FUTSTK/?responsetype=json').json()
df = pd.DataFrame(r['response']['data']['FNORollOverList']['FNORollOverdata'])
print(df)

推荐阅读