python - 如何在手风琴Python中抓取隐藏的文本
问题描述
我编写了一个简单的脚本,它从澳大利亚赌博网站返回特定信息。
它工作得很好,但是我在自动打开每个手风琴下拉菜单时遇到了很多麻烦。我的脚本如下。
from selenium import webdriver
import time
chrome_path =r"C:\Users\Tom\Desktop\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://pointsbet.com.au/basketball/NBA")
time.sleep(2)
driver.find_element_by_xpath("""/html/body/div[1]/div[2]/sport-competition-component/div[1]/div[2]/div[1]/div/event-list/div[1]/event/div/header/div[1]/h2/a""").click()
time.sleep(2)
posts = driver.find_elements_by_class_name("market")
for post in posts:
print(post.text)
with open('output.xls',mode ='a') as f:
f.write(post.text)
f.write('\n')
driver.quit()
该脚本打印类名“market”中包含的所有可见文本。
输出如下:
HEAD TO HEAD
Brooklyn Nets
1.29
Atlanta Hawks
3.78
LINE
Brooklyn Nets -8.0
1.95
Atlanta Hawks +8.0
1.89
TOTAL POINTS
Over 227.0
1.91
Under 227.0
1.91
我的问题是手风琴下有隐藏的文字。看截图: 截图
- 例如,我无法在“双重结果”标题下抓取数据
一旦它被“点击”,脚本就可以正常工作。
我已经编写了一些脚本来自动点击手风琴,但不幸的是 xpath 名称随着每次匹配而改变。
有谁知道如何自动一次单击所有手风琴(不知道元素信息),或者有没有人有替代解决方案。
欢迎任何帮助,谢谢
更新:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
chrome_path =r"C:\Users\Tom\Desktop\chromedriver.exe"
d = webdriver.Chrome(chrome_path)
d.get("https://pointsbet.com.au/basketball/NCAA-March-Madness")
time.sleep(2)
d.find_element_by_xpath("""/html/body/div[1]/div[2]/sport-competition-component/div[1]/div[2]/div[1]/div/event-list/div[1]/event/div/header/div[1]/h2/a""").click()
time.sleep(2)
expandable = WebDriverWait(d, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".h2.accordion-toggle.event-name")))
expandables = d.find_elements_by_css_selector('.h2.accordion-toggle.event-name')
for item in expandables:
item.click()
posts = d.find_elements_by_class_name("market")
for post in posts:
print(post.text)
with open('output.xls',mode ='a') as f:
f.write(post.text)
f.write('\n')
d.quit()
错误:
Traceback (most recent call last):
File "C:\Users\Tom\Desktop\Python test\points1 - Copy.py", line 21, in <module>
item.click()
File "C:\Users\Tom\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\remote\webelement.py", line 80, in click
self._execute(Command.CLICK_ELEMENT)
File "C:\Users\Tom\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\remote\webelement.py", line 633, in _execute
return self._parent.execute(command, params)
File "C:\Users\Tom\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\Tom\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementNotVisibleException: Message: element not interactable
(Session info: chrome=73.0.3683.86)
(Driver info: chromedriver=2.43.600210 (68dcf5eebde37173d4027fa8635e332711d2874a),platform=Windows NT 10.0.17134 x86_64)
解决方案
您可以使用 css 类选择器来获取下拉列表的集合并通过迭代集合来单击它们。示例页面:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
d = webdriver.Chrome()
d.get("https://pointsbet.com.au/basketball/NBA/58738")
expandable = WebDriverWait(d, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".h2.accordion-toggle.event-name")))
expandables = d.find_elements_by_css_selector('.h2.accordion-toggle.event-name')
for item in expandables:
item.click()
推荐阅读
- r - 更改 IRF 图的刻度大小(R studio)
- javascript - 有没有办法检查特定电子邮件是否存在以在 nodejs 中使用 nodemailer 发送电子邮件?
- python - 如何解决此错误 - “ValueError:传递给 MultinomialNB 的数据中的负值(输入 X)
- pygobject - 为什么 gi.repository.cairo 存在?
- javascript - 递归函数如何返回值?
- .htaccess - Htaccess 用参数重写 URL
- laravel - Laravel sqlite 数据库锁定
- javascript - 如何使用 Puppeteer 获取图像的 src 属性?我收到“无法读取 null 的属性 'getAttribute'”错误
- mapbox - Mapbox GeoCode api 正在返回无效地址字符串的坐标
- three.js - 如何在 ar.js 中导入 .gltf 模型?