python-3.x - 由于表格隐藏,无法进行网络抓取
问题描述
我试图从这个网站上抓取https://chanakyya.com/Election-Results?electionType=Assembly。但是我无法通过检查访问表格以进行网络抓取。
解决方案
您可以获取填充表的 json 数据并读入数据帧。
注意:我使用了一个包choice
来获取特定状态的输入:
import requests
import pandas as pd
#pip install choice
import choice
states = requests.get('https://chanakyya.com/Chanakya/states.json').json()
state_list = [x['stateDisplayName'] for x in states]
state_choice = choice.Menu(state_list).ask()
stateName = [x['stateName'] for x in states if x['stateDisplayName'] == state_choice][0]
url = f'https://chanakyya.com/Chanakya/{stateName}/{stateName}.json'
resultsData = requests.get(url).json()
tables = {}
for key, value in resultsData['ELECTION_DATA']['stateLevelData'].items():
table = pd.DataFrame(value)
tables[key] = table
for key, table in tables.items():
print(f'*** {key} ***')
print(table,'\n\n')
# Assembly Data
assembly_list = resultsData['ASSEMBLY_NAME_DATA'].keys()
assembly_choice = choice.Menu(assembly_list).ask()
assembly_choice = resultsData['ASSEMBLY_NAME_DATA'][assembly_choice].split('.json')[0]
assembly_url = f'https://chanakyya.com/Chanakya/{stateName}/AssemblyData/{assembly_choice}_Details.json'
resultsAssemblyData = requests.get(assembly_url).json()
print(resultsAssemblyData)
输出:
Make a choice:
0: Andhra Pradesh
1: Assam
2: Bihar
3: Chhattisgarh
4: Goa
5: Gujarat
6: Haryana
7: Himachal Pradesh
8: Jammu & Kashmir
9: Jharkhand
Enter number or name; return for next page
?
0: Karnataka
1: Kerala
2: Madhya Pradesh
3: Maharashtra
4: New Delhi
5: Odisha
6: Punjab
7: Rajasthan
8: Tamilnadu
9: Telangana
Enter number or name; return for next page
?
0: Uttarakhand
1: Uttar Pradesh
2: West Bengal
Enter number or name; return for next page
? 1
*** 2009_Parliament ***
partyName numberOfSeatLeading votePercentage
0 BSP 100 27.42
1 SP 118 23.26
2 INC 95 18.26
3 BJP 62 17.50
4 IND 5 4.53
5 RLD 21 3.27
6 PECP 1 0.98
7 AD 0 0.85
8 SBSP 0 0.55
9 JD(U) 0 0.30
10 RSBP 1 0.28
11 JPS 0 0.20
12 CPI 0 0.16
13 MD 0 0.12
14 IJP 0 0.11
15 RTKP 0 0.11
16 RPI(A) 0 0.10
17 PMSP 0 0.10
18 ASP 0 0.07
19 EKSP 0 0.05
*** 2012_Assembly ***
partyName numberOfSeatLeading votePercentage
0 SP 224 29.13
1 BSP 80 25.91
2 BJP 47 15.00
3 INC 28 11.65
4 IND 6 4.14
5 PECP 4 2.35
6 RLD 9 2.33
7 AD 1 0.90
8 GED 2 0.55
9 NCP 1 0.33
10 IEMC 1 0.25
11 CPI 0 0.13
12 CPIM 0 0.09
*** 2014_Parliament ***
partyName numberOfSeatLeading votePercentage
0 BJP 328 42.32
1 SP 42 22.19
2 BSP 9 19.63
3 INC 15 7.48
4 IND 0 1.75
5 AAP 0 1.02
6 AD 9 1.01
7 RLD 0 0.86
8 NOTA 0 0.74
9 PECP 0 0.62
10 QED 0 0.44
11 BMUP 0 0.19
12 CPI 0 0.16
13 SBSP 0 0.14
14 AITC 0 0.13
15 RPD 0 0.12
16 JD(U) 0 0.07
17 RUC 0 0.07
18 SHS 0 0.04
19 LD 0 0.04
20 NAP 0 0.04
21 BSCP 0 0.04
22 MD 0 0.03
*** 2017_Assembly ***
partyName numberOfSeatLeading votePercentage
0 BJP 312 39.70
1 BSP 19 22.20
2 SP 47 21.80
3 INC 7 6.20
4 IND 3 2.60
5 RLD 1 1.80
6 AD 9 1.00
7 SPSP 4 0.71
8 Nirbal Indian Shoshit Hamara Aam Dal 1 0.60
9 PECP 0 0.30
*** 2019_Parliament ***
partyName numberOfSeatLeading votePercentage
0 BJP 275 49.80
1 BSP 65 19.40
2 SP 40 18.10
3 INC 8 6.30
4 RLD 4 1.71
5 ADAL 9 1.20
6 JDL 2 0.20
推荐阅读
- kubernetes - 我很难在 gke 上设置 kafka 并且想知道设置它的最佳方式?
- c# - 无法使用 regasm 注册接口
- c# - 如何使用 WPF 画布以编程方式将图像从一个点动画到另一个点?
- python-3.x - 如何识别熊猫系列中的[1,X,X,X,1]重复模式
- python - 删除单词中的重复字符
- d3.js - 如何用户上传自己的数据/外部数据以进行可视化 D3.js?
- javascript - Leaflet.Draw 以英里为单位显示距离
- regex - 比赛前如何查找和替换?
- javascript - 如何删除对象中未定义的键和值?
- sql - 根据数据库规范化在 SQL 中创建表的更好方法是什么?