首页 > 解决方案 > Python 抓取创建有效载荷 cnmv.es 并渲染 javascript

问题描述

我使用有效负载和搜索文本 aaa 发送请求https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx,但得到 javascript 响应。所以我需要渲染 javascript,但我不想使用 Selenium。我也不确定我的有效载荷是否良好。

    url = 'https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx'
    search_text = 'aaa'
    r = requests.get('https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx')
    soup = BeautifulSoup(r.content, 'html.parser')

    VIEWSTATE  = soup.find(id="__VIEWSTATE")['value'] + '%3D&'
    VIEWSTATEGENERATOR = '__VIEWSTATEGENERATOR=' + soup.find(id="__VIEWSTATEGENERATOR")['value']

    EVENTVALIDATION = '&__EVENTVALIDATION' + soup.find(id="__EVENTVALIDATION")['value']
    SEARCH = "&ctl00%24wBusqueda%24txtBusqueda=&ctl00%24ContentPrincipal%24txtBusqueda={0}&ctl00%24ContentPrincipal%24btnBuscar=Search".format(search_text)
    

    payload = '__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=' + VIEWSTATE + VIEWSTATEGENERATOR + EVENTVALIDATION + SEARCH


    headers = {
    'Connection': 'keep-alive',
    'Cache-Control': 'max-age=0',
    'Upgrade-Insecure-Requests': '1',
    'Origin': 'https://www.cnmv.es',
    'Content-Type': 'application/x-www-form-urlencoded',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-User': '?1',
    'Sec-Fetch-Dest': 'document',
    'Referer': 'https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx',
    'Accept-Language': 'en-US,en;q=0.9',
    }

    response = requests.request("POST", url, headers=headers, data = payload)

    print(response.text.encode('utf8'))

标签: javascriptpythonweb-scrapingrequest

解决方案


我没有测试你payload,但我不知道你为什么要%3D添加__VIEWSTATE

我使用字典,它requests会自动转换为字符串,我不必&手动添加。而且我不必使用in等%24代替。$ctl00$wBusqueda$txtBusqueda

payload = {
    '__EVENTTARGET': '',
    '__EVENTARGUMENT': '',
    '__VIEWSTATE': soup.find(id="__VIEWSTATE")['value'],
    '__VIEWSTATEGENERATOR': soup.find(id="__VIEWSTATEGENERATOR")['value'],
    '__EVENTVALIDATION': soup.find(id="__EVENTVALIDATION")['value'],
    'ctl00$wBusqueda$txtBusqueda': '',
    'ctl00$ContentPrincipal$txtBusqueda': search_text,
    'ctl00$ContentPrincipal$btnBuscar': 'Buscar',
}        

顺便说一句:没有标题的代码对我有用,但我会保留它们的评论。

import requests
from bs4 import BeautifulSoup

url = 'https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx'
search_text = 'aaa'

r = requests.get('https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx')
soup = BeautifulSoup(r.content, 'html.parser')

payload = {
    '__EVENTTARGET': '',
    '__EVENTARGUMENT': '',
    '__VIEWSTATE': soup.find(id="__VIEWSTATE")['value'],
    '__VIEWSTATEGENERATOR': soup.find(id="__VIEWSTATEGENERATOR")['value'],
    '__EVENTVALIDATION': soup.find(id="__EVENTVALIDATION")['value'],
    'ctl00$wBusqueda$txtBusqueda': '',
    'ctl00$ContentPrincipal$txtBusqueda': search_text,
    'ctl00$ContentPrincipal$btnBuscar': 'Buscar',
}        

headers = {
#    'Connection': 'keep-alive',
#    'Cache-Control': 'max-age=0',
#    'Upgrade-Insecure-Requests': '1',
#    'Origin': 'https://www.cnmv.es',
#    'Content-Type': 'application/x-www-form-urlencoded',
#    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
#    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
#    'Sec-Fetch-Site': 'same-origin',
#    'Sec-Fetch-Mode': 'navigate',
#    'Sec-Fetch-User': '?1',
#    'Sec-Fetch-Dest': 'document',
#    'Referer': 'https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx',
#    'Accept-Language': 'en-US,en;q=0.9',
}

r = requests.post(url, headers=headers, data=payload)
#print(response.text)

soup = BeautifulSoup(r.content, 'html.parser')
for item in soup.find_all('option'):
    print(item['value'], '|', item.text)

结果:

CLP3846 | AAA TRADE LTD
V85543155 | DWS DINERO GOBIERNOS AAA, FI
V85263911 | EUROVALOR DEUDA PUBLICA EUROPEA AAA, FI
9686 | WWW.AAARATEDBOND.COM

推荐阅读