python - 如何从此网页中抓取一个数字(在 python 中)
问题描述
我曾经通过这段代码(python 3)得到那个数字:
import requests
import bs4
res_bonbast = requests.get('https://bonbast.com/')
soup_bonbast = bs4.BeautifulSoup(res_bonbast.text,"lxml")
int(float(soup_bonbast.select('#usd1_top')[0].getText()
但最近他们似乎改变了一些东西
解决方案
您的问题是直到页面加载后才会填充此值。当您的脚本向您展示时,此元素的 HTML 确实是空白的。当您在浏览器中加载站点时会发生什么,您可以通过打开开发工具并查看网络选项卡来确认这一点,您首先会得到一些该元素为空白的 HTML。稍后,调用https://bonbast.com/json返回用于填充元素的值。
您需要做的是自己向 bonbast.com/json 发出请求,并从 json 中提取您想要的值,而不是进行 HTML 解析。您正在寻找的密钥是 usd1。
bonbast.com/json 端点可能需要标头中的其他详细信息。我通过打开我的开发工具网络选项卡(在 Chrome 中,ctrl+shift+i >> 网络)访问 bonbast.com 并找到对 bonbast.com/json 的请求来捕获下面的 curl 请求。然后我右键单击它并选择“复制为卷曲”
curl 'https://bonbast.com/json' \
-H 'authority: bonbast.com' \
-H 'sec-ch-ua: "Chromium";v="95", ";Not A Brand";v="99"' \
-H 'accept: application/json, text/javascript, */*; q=0.01' \
-H 'content-type: application/x-www-form-urlencoded; charset=UTF-8' \
-H 'x-requested-with: XMLHttpRequest' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36' \
-H 'sec-ch-ua-platform: "Linux"' \
-H 'origin: https://bonbast.com' \
-H 'sec-fetch-site: same-origin' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-dest: empty' \
-H 'referer: https://bonbast.com/' \
-H 'accept-language: en-US,en;q=0.9' \
-H 'cookie: st_bb=0; _gid=GA1.2.587414378.1636538685; __gads=ID=2f6e05bb70db575d-2208cfa441cc00d3:T=1636538685:RT=1636538685:S=ALNI_MaKL18-XZaWbbhlmh2h3RGvYmVKRw; _ga_PZF6SDPF22=GS1.1.1636562265.2.0.1636562265.0; _ga=GA1.2.633937873.1636538685; _gat_gtag_UA_35412804_1=1' \
--data-raw 'data=0d7e26d17fde20e86b760b00127132d4%2CfTtTZ%2C2021-11-10-16-38-37&webdriver=false' \
--compressed
结果是:
{ "try1": "2890",
"month": 8,
"emami1": "12450000",
"afn2": "309",
"afn1": "311",
"rub2": "397",
"azadi1_22": "6250000",
"bhd2": "74870",
"azn1": "16730",
"bhd1": "75370",
"azadi1g": "2350000",
"bourse": "1904324.2",
"try2": "2870",
"cny1": "4450",
"cny2": "4430",
"cad1": "22860",
"cad2": "22760",
"jpy1": "2495",
"thb1": "865",
"usd1": "28420",
"usd2": "28320",
"thb2": "860",
"azn2": "16630",
"dkk1": "4400",
"amd2": "590",
"day": 19,
"minute": "41",
"amd1": "595",
"bitcoin": "68616.85",
"hour": "20",
"sar2": "7545",
"rub1": "400",
"azadi1g2": "2250000",
"azadi12": "12000000",
"eur1": "32725",
"eur2": "32575",
"emami12": "12250000",
"second": "45",
"omr1": "73825",
"year": 1400,
"chf2": "30855",
"chf1": "31005",
"azadi1_42": "3700000",
"jpy2": "2485",
"kwd2": "93795",
"kwd1": "94195",
"sek1": "3280",
"gbp2": "38090",
"gbp1": "38290",
"sek2": "3265",
"myr1": "6850",
"myr2": "6820",
"omr2": "73525",
"azadi1": "12350000",
"azadi1_2": "6400000",
"aud2": "20805",
"azadi1_4": "3800000",
"aud1": "20905",
"dkk2": "4380",
"inr2": "380",
"inr1": "382",
"last_modified": "November 10, 2021 16:00",
"aed2": "7715",
"aed1": "7735",
"iqd2": "1935",
"qar1": "7805",
"qar2": "7775",
"iqd1": "1945",
"hkd2": "3620",
"hkd1": "3650",
"sar1": "7575",
"created": "November 10, 2021 00:01",
"sgd2": "20930",
"sgd1": "21030",
"ounce": "1854.31",
"weekday": "Wednesday",
"mithqal": "5416000",
"gol18": "1250288",
"nok1": "3305",
"nok2": "3290"
}
但是,对你来说是个坏消息。curl请求中的参数似乎在一段时间后过期。我相信正在发生的事情是,当您访问该网站时,您会收到一个 cookie。该 cookie 是您向 json 端点发出请求的权限,但它会在短时间内过期。
可靠地抓取此页面需要少量工作 - 不仅仅是 StackOverflow 问题/答案。如果您想更多地谈论如何完成此操作,请随时给我发电子邮件(在我的个人资料中)。
推荐阅读
- c# - 如何在 C# 中调用存储在 App.Config 文件中的拆分值
- wordpress - 如何仅通过传递联系表格7的ID来获取联系表格?
- docker - 使用 JMX_Exporter 向 prometheus 公开 Kotlin 的指标
- laravel - Mac Laravel file_put_content 权限失败问题
- php - PHP mysqli_query 没有在 mysqli_free_result 上释放内存
- node.js - Express 验证器 validatorResult 不是函数
- .net - .NET IIS Soap 服务器从 UNC 路径连接 Firebird 数据库
- android - 如何解析来自这种格式的服务器的 JSON 响应
- julia - 一些 Julia 包支持 Float64(单)格式的数据,但我有 Float64(双)格式的数据
- excel - VBA - 文本框值格式