javascript - Python 抓取创建有效载荷 cnmv.es 并渲染 javascript
问题描述
我使用有效负载和搜索文本 aaa 发送请求https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx,但得到 javascript 响应。所以我需要渲染 javascript,但我不想使用 Selenium。我也不确定我的有效载荷是否良好。
url = 'https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx'
search_text = 'aaa'
r = requests.get('https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx')
soup = BeautifulSoup(r.content, 'html.parser')
VIEWSTATE = soup.find(id="__VIEWSTATE")['value'] + '%3D&'
VIEWSTATEGENERATOR = '__VIEWSTATEGENERATOR=' + soup.find(id="__VIEWSTATEGENERATOR")['value']
EVENTVALIDATION = '&__EVENTVALIDATION' + soup.find(id="__EVENTVALIDATION")['value']
SEARCH = "&ctl00%24wBusqueda%24txtBusqueda=&ctl00%24ContentPrincipal%24txtBusqueda={0}&ctl00%24ContentPrincipal%24btnBuscar=Search".format(search_text)
payload = '__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=' + VIEWSTATE + VIEWSTATEGENERATOR + EVENTVALIDATION + SEARCH
headers = {
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Upgrade-Insecure-Requests': '1',
'Origin': 'https://www.cnmv.es',
'Content-Type': 'application/x-www-form-urlencoded',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-User': '?1',
'Sec-Fetch-Dest': 'document',
'Referer': 'https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx',
'Accept-Language': 'en-US,en;q=0.9',
}
response = requests.request("POST", url, headers=headers, data = payload)
print(response.text.encode('utf8'))
解决方案
我没有测试你payload
,但我不知道你为什么要%3D
添加__VIEWSTATE
我使用字典,它requests
会自动转换为字符串,我不必&
手动添加。而且我不必使用in等%24
代替。$
ctl00$wBusqueda$txtBusqueda
payload = {
'__EVENTTARGET': '',
'__EVENTARGUMENT': '',
'__VIEWSTATE': soup.find(id="__VIEWSTATE")['value'],
'__VIEWSTATEGENERATOR': soup.find(id="__VIEWSTATEGENERATOR")['value'],
'__EVENTVALIDATION': soup.find(id="__EVENTVALIDATION")['value'],
'ctl00$wBusqueda$txtBusqueda': '',
'ctl00$ContentPrincipal$txtBusqueda': search_text,
'ctl00$ContentPrincipal$btnBuscar': 'Buscar',
}
顺便说一句:没有标题的代码对我有用,但我会保留它们的评论。
import requests
from bs4 import BeautifulSoup
url = 'https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx'
search_text = 'aaa'
r = requests.get('https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx')
soup = BeautifulSoup(r.content, 'html.parser')
payload = {
'__EVENTTARGET': '',
'__EVENTARGUMENT': '',
'__VIEWSTATE': soup.find(id="__VIEWSTATE")['value'],
'__VIEWSTATEGENERATOR': soup.find(id="__VIEWSTATEGENERATOR")['value'],
'__EVENTVALIDATION': soup.find(id="__EVENTVALIDATION")['value'],
'ctl00$wBusqueda$txtBusqueda': '',
'ctl00$ContentPrincipal$txtBusqueda': search_text,
'ctl00$ContentPrincipal$btnBuscar': 'Buscar',
}
headers = {
# 'Connection': 'keep-alive',
# 'Cache-Control': 'max-age=0',
# 'Upgrade-Insecure-Requests': '1',
# 'Origin': 'https://www.cnmv.es',
# 'Content-Type': 'application/x-www-form-urlencoded',
# 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
# 'Sec-Fetch-Site': 'same-origin',
# 'Sec-Fetch-Mode': 'navigate',
# 'Sec-Fetch-User': '?1',
# 'Sec-Fetch-Dest': 'document',
# 'Referer': 'https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx',
# 'Accept-Language': 'en-US,en;q=0.9',
}
r = requests.post(url, headers=headers, data=payload)
#print(response.text)
soup = BeautifulSoup(r.content, 'html.parser')
for item in soup.find_all('option'):
print(item['value'], '|', item.text)
结果:
CLP3846 | AAA TRADE LTD
V85543155 | DWS DINERO GOBIERNOS AAA, FI
V85263911 | EUROVALOR DEUDA PUBLICA EUROPEA AAA, FI
9686 | WWW.AAARATEDBOND.COM
推荐阅读
- android - 获取包安装程序
- javascript - 如何在 Apache shibboleth 管理的受限站点上声明 manifest.json 链接标签?
- bash - Bash:如何使用 bash 脚本解析包含 ls -ltr 输出的日志文件以提取在特定时间之前修改的文件名
- javascript - 如何在点击时分别为每个元素设置动画?
- c# - 多行查询执行
- sql-server - 查询使用内存表和/或聚集列存储索引返回错误数据
- c# - 如何知道按键按下了多少毫秒?
- python - python / 我想解决 TypeError: 'str' object is not callable
- python - 有没有一种方法可以在文本文件中相互切换几行?
- reactjs - 无法使用带有 babel 的 reactjs 导入引导程序