selenium - 如何从 cricinfo 网站上抓取数据以获取每场比赛第一局的评论,使用 Selenium 和 Python 修改过滤器
问题描述
嘿伙计们,我一直在尝试从 cricinfo 网站上抓取一些数据,以获取每场比赛的评论。我能够获得第二局的完整数据.. 但无法在第一局中这样做,因为当我检查源代码时,下拉菜单似乎没有选项或诸如选择类之类的任何东西.. 它会如果有人可以建议一些选项来做到这一点,那就太好了。这是页面的 URL https://www.espncricinfo.com/series/8048/commentary/1181768/mumbai-indians-vs-chennai-super-kings-final-indian-premier-league-2019 [输入图片描述这里] 1
解决方案
要从 cricinfo 网站抓取数据以获取每场比赛第一局比赛的评论,使用Selenium修改过滤器,您需要诱导WebDriverWait并且visibility_of_element_located()
您可以使用以下定位器策略:
使用
XPATH
:# -*- coding: utf-8 -*- from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC options = webdriver.ChromeOptions() options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-logging"]) options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe') driver.get('https://www.espncricinfo.com/series/8048/commentary/1181768/mumbai-indians-vs-chennai-super-kings-final-indian-premier-league-2019') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='comment-container-head']/div/div/div/div"))).click() WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[contains(@class, 'ci-dd__menu')]/div[contains(@class, 'ci-dd__menu-list')]/div[contains(@class, 'ci-dd__option') and text()='MI Innings']"))).click() print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='match-comment-long-text match-comment-padder']/span"))).text)
控制台输出:
9.16pm Another ravishing innings from Pollard against CSK in an IPL final. But will 150 be enough on this ground? Mumbai's innings was a stop-start one, with regular wickets ensuring they could never really accelerate. Deepak Chahar was excellent in his final three overs too, but Mumbai have two epic fast bowlers as well. Which team will win their fourth IPL title? We'll find out with Shashank Kishore when the second innings gets underway in a few minutes. Shardul Thakur: "Final game, best two teams in the IPL. We knew some hard cricket was going to happen. I feel Powerplay is where you can attack and take wicket. If you bowl defensively in the Powerplay, you will still get hit for fours and sixes. In the last game, I wanted to get early wickets but there was some good cricket played by Dhawan. But tonight, ball was swinging a bit. Rohit did hit me for a six, but idea wasn't to go away from my plan." Raja: "@Vignesh That team did not have Dhoni as CAPTAIN" Vignesh: "@Husen well , MI defended an even more low total in the same ground in 2017 finals against a team that had Dhoni ;)" Satyam: "Think MI are 20-25 runs short here. At least 15 more would have been more defendable." Divya: "Last 12 balls : 3 fours 3 wickets 6 dots 1 singles" Mustafa Moudi: "If anyone feels this is a below-par score then let me remind everyone that MI defended 137 on this same ground and that too by a massive 40 runs and defeated the Home Team in this season !!" Husen: "@Moustafa - That team did not have a Dhoni "
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
推荐阅读
- typescript - Shebang 被 rpmbuild 破坏
- javascript - 顶点交互式网格中的金额值
- javascript - 如何将数组显示为选项标签
- javascript - 我如何创建“临时 html”并加载它?
- google-cloud-platform - 我们可以使用基于用户地理位置的 GCP 的 L7 HTTPS 负载均衡器对后端进行地理定位吗?
- android - Firebase 消息传递组件不存在
- python - 从 pptx 获取超链接文本
- python - 在 for 循环中运行函数
- karate - 空手道 UI:运行多个功能文件时出现驱动程序配置/启动失败错误
- javascript - 如何设置有条件的多个细节面板?