python - 无法在 python 上阅读网页
问题描述
我正在尝试从页面顶部的该网站https://roobet.com/读取一个数字使用 page=requests.get('https://roobet.com/') 我不明白number 为什么会发生这种情况,我必须做什么?我想读的数字叫做“赌注:XXXXXXX”但是当我使用 requests.get() 我没有看到这样的东西
PS:当我在网页上使用viewsource时,我仍然没有看到这样的数字或文字。如何读取和导入该号码?
import requests
page=requests.get("https://roobet.com")
text_page=page.text
print(text_page)
出去:
<!DOCTYPE html>\n<html lang="en">\n\n <head>\n <!-- Google Tag Manager -->\n <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\'gtm.start\':\n new Date().getTime(),event:\'gtm.js\'});var f=d.getElementsByTagName(s)[0],\n j=d.createElement(s),dl=l!=\'dataLayer\'?\'&l=\'+l:\'\';j.async=true;j.src=\n \'https://www.googletagmanager.com/gtm.js?id=\'+i+dl;f.parentNode.insertBefore(j,f);\n })(window,document,\'script\',\'dataLayer\',\'GTM-563FCQS\');</script>\n <!-- End Google Tag Manager -->\n <meta charset="UTF-8">\n <meta name="viewport" content="width=device-width, initial-scale=1">\n <link rel="preconnect" href="https://fonts.googleapis.com/" crossorigin>\n <title>Roobet | Crypto\'s Fastest Growing Casino</title>\n <meta name="description" content="Roobet, crypto\'s fastest growing casino. Hop on in, chat to others and play exciting games - Come and join the fun!">\n <base href="/">\n <meta name="theme-color" content="#191b31" />\n <link rel="icon" type="image/png" href="images/favicon.png">\n <link rel="manifest" href="/manifest.json" />\n <script src="https://cdn.onesignal.com/sdks/OneSignalSDK.js" async ></script>\n <script src="https://maps.googleapis.com/maps/api/js?key=AIzaSyCXI19SE-ZWv_ZyW7gGMzCTf4TGfOA3Sdk&libraries=places"></script>\n <script src="https://tekhou5-dk2.pragmaticplay.net/gs2c/common/js/lobby/GameLib.js" />\n <script>\n var OneSignal = window.OneSignal || [];\n OneSignal.push(function() {\n OneSignal.init({\n appId: "29c72f64-e7e6-408c-99b2-d86a84c6a9cb",\n notifyButton: {\n enable: false,\n autoResubscribe: true,\n },\n welcomeNotification: {\n disable: true\n }\n });\n });\n </script>\n <link href="0.20c4e82d288213005850.css" rel="stylesheet"><link href="app.20c4e82d288213005850.css" rel="stylesheet"></head>\n <body>\n <!-- Google Tag Manager (noscript) -->\n <noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-563FCQS"\n height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>\n <!-- End Google Tag Manager (noscript) -->\n <div id="root"></div>\n <div id="modalRoot"></div>\n <div id="loader">\n <div class="loaderLogo">\n <img src="/images/logo.svg" />\n </div>\n </div>\n <script type="text/javascript" src="vendors.bundle.js?v=1272961ec29bf316a891"></script><script type="text/javascript" src="locale.bundle.js?v=f09f53a5cbf99ec0cac6"></script><script type="text/javascript" src="app.bundle.js?v=9f19f2ed821de8c93f9c"></script></body>\n <script>(function(){var w=window;var ic=w.Intercom;if(typeof ic==="function"){ic(\'reattach_activator\');ic(\'update\',intercomSettings);}else{var d=document;var i=function(){i.c(arguments)};i.q=[];i.c=function(args){i.q.push(args)};w.Intercom=i;function l(){var s=d.createElement(\'script\');s.type=\'text/javascript\';s.async=true;s.src=\'https://widget.intercom.io/widget/gcr7bzde\';var x=d.getElementsByTagName(\'script\')[0];x.parentNode.insertBefore(s,x);}if(w.attachEvent){w.attachEvent(\'onload\',l);}else{w.addEventListener(\'load\',l,false);}}})()</script>\n <script src="https://intaggr.softswiss.net/public/sg.js"></script>\n <script type="text/javascript" src="https://www.google.com/recaptcha/api.js?render=6LdG97YUAAAAAHMcbX2hlyxQiHsWu5bY8_tU-2Y_"></script>\n <script type="text/javascript">\n if (typeof window.grecaptcha !== \'undefined\') {\n grecaptcha.ready(function() {\n grecaptcha.execute(\'6LdG97YUAAAAAHMcbX2hlyxQiHsWu5bY8_tU-2Y_\', {action: \'homepage\'});\n })\n }\n </script>\n</html>\n'
在真实页面中,您可以看到我们确实有所有时间的赌注和其他东西
解决方案
您应该使用一些网络抓取工具来处理这个问题。由于您了解 Python,因此Selenium可能是一种选择。
我正在分享一个小片段来帮助您入门。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
# Open the webpage
driver.get('https://roobet.com/')
elem_xpath = '//div[contains(text(), "Wagers All Time")]/following-sibling::div'
try:
# Wait till the element is located
elem = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, elem_xpath)))
print (elem.text)
finally:
driver.quit()
推荐阅读
- angular - Angular Reactive Forms - 添加动态组元素 - 单选按钮问题
- php - 如何获取 Twig 数组中的关键位置?
- git - 如何撤消 git checkout -B 分支?
- javascript - 之前配置的 ChromeDriver 服务仍在运行
- android - Searchview 在单击文本后显示矩形
- vertica - 如何将 Avro 记录数组加载到 Vertica ARRAY
- android - 如何将音频数据从 Android 流式传输到 WebSocket 服务器?
- java - 在 Java 中处理对象数组
- unity3d - GetComponent 重置布尔值
- php - 未设置可下载的 Woocommerce 产品内容