javascript - Python Parsing Javascript with beautifulsoup
问题描述
I am trying to parse content within JavaScript. I have an idea of how to do it, but I am not entirely sure. I have read up on some examples, and I am thinking that using the re library might be the way to go.
Here is my code so far:
import requests
import json
import re
from bs4 import BeautifulSoup
url = r'https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=13&rver=6.7.6643.0&wp=MBI_SSL&wreply=https:%2f%2faccount.xbox.com%2fen-us%2faccountcreation%3freturnUrl%3dhttps:%252f%252fwww.xbox.com:443%252fen-US%252f%26pcexp%3dtrue%26uictx%3dme%26rtc%3d1&lc=1033&id=292543&aadredir=1'
s = requests.Session()
soup = BeautifulSoup(s.get(url).content, 'html.parser')
print(soup.find_all("script", type="text/javascript")[5].prettify())
Here is just a snippet of the parsed content. I am trying to get access to this data, particularly "value"
<input type="hidden" name="PPFT" id="i0327" value="Dd**Lkp2L3EKDvGi3u6PEweEQUhvW*1jPrA3FgGSdeYoY8FERluiTqDef6QF3V5NkN*4yPg7vvxI3jo5oKPRelhfU3rYGFkxbxyvSBssiwFA!8LwocAbVDtrDq11Wk3F4LzRBQck3H4ca5r3Qhv8b0h4CxcEZgAnGAkcWE7fExGn1dBwGoY8sZVL2!ZBMjnJEanidLF!Yi975frkQ6Cys2oUb863xoLxdvZGuLQRxRLjjKubaCHlWQbD0b*Wzq49EA$$"/>
I appreciate all responses in advance. Thanks!
解决方案
from bs4 import BeautifulSoup as bs
import requests
import re
url = 'https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=13&rver=6.7.6643.0&wp=MBI_SSL&wreply=https:%2f%2faccount.xbox.com%2fen-us%2faccountcreation%3freturnUrl%3dhttps:%252f%252fwww.xbox.com:443%252fen-US%252f%26pcexp%3dtrue%26uictx%3dme%26rtc%3d1&lc=1033&id=292543&aadredir=1'
page = requests.get(url)
html = bs(page.text, 'lxml')
input = html.findAll('script', type="text/javascript")[5].prettify()
value = re.findall(r'value=".+"/', input)
#value = str(value).replace('value="', '').replace('"/','')
value = str(value).replace('value="', '').replace('"/','').replace("['",'').replace("']",'')
print(value)
Output:
DVSXQahhtomXS2Y4k2itS5MPP52mJgUkC7LH!W*1DmjHiWk*npajBfgXK5yp3*!bu3Wuvvs7xavleUV3nIbjLZHckj73QMe8wipwXhCqpXuUZQ2wnJvNYAVNCg9XxKPuIovp7!sLbumrufuYefyzM6UQLkMb5c7MuImDofVhLlKxpI7Pohe8sO2x8r63TtFCTDphWzqXKJE3B8DRK*AhMbFsmdP0sj2CXMZ7dyTfLJSr1zWBlaHTqJPLvhgzLSiaEg$$
推荐阅读
- angular - 自动设置占位符的指令
- tfs - TFS 2018 Wiki HTML 视频不显示嵌入视频
- c# - c# Linq - 检查复合键是否存在于另一个列表中
- docker - 如何在 Raspberry Pi 3 上的 Docker 中运行 haproxy?
- angular - 禁用特定值的角度材料行上的单击事件
- python - 使用 Flask 应用程序在本地计算机上出现 404 错误
- r - 在R中使用过滤器后如何在表格中显示具有非零频率的组合
- python - 在 Keras 中使用 LSTM 进行多对一预测,重塑数据
- sql - SQL Server 中是否有与 pythons 的 SequenceMatcher 等效的方法来连接相似的列?
- r - R 不显示结果