python - 从具有多个表的网页中获取python中的数据
问题描述
我正在尝试解析下面的网页,以获取现在在交易所达到历史最高点或最低点的股票名称。
https://www.bseindia.com/markets/equity/EQReports/HighLow.html?Flag=H#
但是,当我使用美丽的汤下载网页并检查数据时,只显示了一半的股票,这是因为该页面中有 2 页,因此使用上述方法,一页上有 25 个股票,另一页上有 25 个股票,我能够解析第一页只是,如果我点击第二页,网址也是一样的,请帮助我如何解决这个问题?
解决方案
该站点有一个 api 端点,它以漂亮的 json 格式向您返回数据。您可以获得该 json 格式的响应,然后对其进行规范化以创建一个表。现在,当它执行此操作时,它会返回 2 个表,所以我不确定您是否想要第二个表。如果没有,我将它们分别存储,然后将它们附加到一起。
import requests
from pandas.io.json import json_normalize
url = 'https://api.bseindia.com/BseIndiaAPI/api/MktHighLowData/w?Grpcode=&HLflag=H&indexcode=&scripcode='
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'}
payload = {
'Grpcode':'',
'HLflag': 'H',
'indexcode':'' ,
'scripcode':'' }
jsonObj = requests.get(url, headers=headers, params=payload).json()
df_table = json_normalize(jsonObj['Table'])
df_table1 = json_normalize(jsonObj['Table1'])
df = df_table.append(df_table1)
输出:
print (df)
ALLTimeHigh ... dt_tm
0 1019.95 ... 2019-02-25T16:00:03
1 263.00 ... 2019-02-25T16:00:03
2 24.00 ... 2019-02-25T16:00:03
3 35.90 ... 2019-02-25T16:00:03
4 29.75 ... 2019-02-25T16:00:03
5 43.00 ... 2019-02-25T16:00:03
6 140.40 ... 2019-02-25T16:00:03
7 15.39 ... 2019-02-25T16:00:03
8 724.00 ... 2019-02-25T16:00:03
9 1495.00 ... 2019-02-25T16:00:03
10 123.15 ... 2019-02-25T16:00:03
11 121.00 ... 2019-02-25T16:00:03
12 238.50 ... 2019-02-25T16:00:03
13 89.00 ... 2019-02-25T16:00:03
14 819.95 ... 2019-02-25T16:00:03
15 112.40 ... 2019-02-25T16:00:03
16 49.95 ... 2019-02-25T16:00:03
17 330.85 ... 2019-02-25T16:00:03
18 167.45 ... 2019-02-25T16:00:03
19 25.10 ... 2019-02-25T16:00:03
20 940.00 ... 2019-02-25T16:00:03
21 165.00 ... 2019-02-25T16:00:03
22 NaN ... 2019-02-25T16:00:03
23 239.00 ... 2019-02-25T16:00:03
24 151.55 ... 2019-02-25T16:00:03
25 34.35 ... 2019-02-25T16:00:03
26 256.15 ... 2019-02-25T16:00:03
27 49.75 ... 2019-02-25T16:00:03
28 103.25 ... 2019-02-25T16:00:03
29 50.50 ... 2019-02-25T16:00:03
.. ... ... ...
87 135.00 ... 2019-02-25T16:00:03
88 219.80 ... 2019-02-25T16:00:03
89 58.00 ... 2019-02-25T16:00:03
90 494.00 ... 2019-02-25T16:00:03
91 285.30 ... 2019-02-25T16:00:03
92 55.65 ... 2019-02-25T16:00:03
93 4.45 ... 2019-02-25T16:00:03
94 50.00 ... 2019-02-25T16:00:03
95 50.00 ... 2019-02-25T16:00:03
96 92.50 ... 2019-02-25T16:00:03
97 154.80 ... 2019-02-25T16:00:03
98 82.40 ... 2019-02-25T16:00:03
99 293.85 ... 2019-02-25T16:00:03
100 396.00 ... 2019-02-25T16:00:03
101 98.00 ... 2019-02-25T16:00:03
102 144.60 ... 2019-02-25T16:00:03
103 11.50 ... 2019-02-25T16:00:03
104 42.95 ... 2019-02-25T16:00:03
105 313.00 ... 2019-02-25T16:00:03
106 1120.00 ... 2019-02-25T16:00:03
107 87.00 ... 2019-02-25T16:00:03
108 82.00 ... 2019-02-25T16:00:03
109 214.00 ... 2019-02-25T16:00:03
110 505.00 ... 2019-02-25T16:00:03
111 1525.00 ... 2019-02-25T16:00:03
112 220.00 ... 2019-02-25T16:00:03
113 36.00 ... 2019-02-25T16:00:03
114 170.00 ... 2019-02-25T16:00:03
115 549.50 ... 2019-02-25T16:00:03
116 4990.00 ... 2019-02-25T16:00:03
[168 rows x 19 columns]
推荐阅读
- mysql - MySQL 默认空值
- swift - 快速声明变量
- linux - 如何编写linux内核模块来修改数据包的源MAC地址?
- unity3d - 将数据库链接到服务器上的 php 脚本到 c#
- android - Android Studio 预览版没有链接障碍
- python - 如何将唯一标识符与来自 Python 应用程序的 PostgreSQL 连接相关联?
- delphi - 如何遍历过滤的 dbGrid
- javascript - 如何通过在javascript中移动滑块来增加圆圈中的线条
- javascript - 使用queryselectorall选择innerHTML等于指定值的所有元素是否可行?
- ffmpeg - FFMPEG 再保险