python - 从 PubMed find_element_by_css_selector VS visibility_of_all_elements_located 中抓取文本
问题描述
我正在尝试从 PubMed 的一篇文章中获取摘要。如果我用下面的代码直接进入文章链接,那么我可以得到我想要的摘要。
driver = webdriver.Chrome(executable_path="../drivers/chromedriver.exe")
driver.get("https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6268174/")
time.sleep(randint(1, 5))
abstract = driver.find_element_by_css_selector("div#ABS1 p").text
但是,我有一个包含一千多篇文章的文章列表来获取它们的摘要。所以我做了一个自动化脚本如下
import time
from random import randint
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
# Define article name, i.e. Artificial intelligence in radiology
name = "Artificial intelligence in radiology"
# Invoke Chrome and go to PubMed website
driver = webdriver.Chrome(executable_path="../drivers/chromedriver.exe")
driver.get("https://pubmed.ncbi.nlm.nih.gov")
print("Accessing " + driver.title)
print(driver.current_url)
# Enter research article
time.sleep(randint(1, 5))
driver.find_element_by_css_selector("input[type='search']").send_keys(name)
# Click search
time.sleep(randint(1, 5))
driver.find_element_by_css_selector("span[class='usa-search-submit-text']").click()
# Click on the article link
time.sleep(randint(1, 5))
driver.find_element_by_css_selector("a[class='docsum-title']").click()
# Click to navigate to full text
time.sleep(randint(1, 5))
driver.find_element_by_css_selector("a[data-ga-action='PMC']").click()
# Get abstract
time.sleep(randint(1, 5))
abstract = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div#ABS1 p")))
我使用与div#ABS1 p
前面代码相同的标记,但它不起作用并引发超时异常。请问您的建议是什么造成了这种差异以及如何解决它?
解决方案
这取决于它引发超时异常的哪一行?
但无论如何,用显式等待替换“time.sleep”行:
# Enter research article
inputWait = EC.element_to_be_located(By.CSS_SELECTOR, "input[type='search']")
WebDriverWait(driver, 10).until(inputWait)
driver.find_element_by_css_selector("input[type='search']").send_keys(name)
# Click search
spanWait = EC.element_to_be_located(By.CSS_SELECTOR, "span[class='usa-search-submit-text']")
WebDriverWait(driver, 10).until(spanWait)
driver.find_element_by_css_selector("span[class='usa-search-submit-text']").click()
..ETC
通常,我更喜欢使用 xpath 而不是 CSS 选择器。此外,您可能可以使用 requests 和 beautifulsoup 而不是 selenium 来做到这一点。
推荐阅读
- c# - Gridview 中的 Select2 插件仅适用于 asp.net 中的最后一行网格
- lua - 在 ChaiScript 中返回多个值?
- javascript - 我应该关注 npm install 期间 NPM 显示的漏洞吗?
- python - pandas 输出到 CSV - 如何在标题行中保存日期?
- firebase - I/BiChannelGoogleApi(4964): [FirebaseAuth:] getGoogleApiForMethod() 返回 Gms: com.google.firebase.auth.api.internal.zzaq@b13d73
- python - 如何将来自多个线程的信息记录到python中的不同文件中?
- service - Angular,我们是否需要制作服务单例或默认情况下它作为单例工作?
- python - 如何在 Google Colab 中链接图像 pr 文件
- apache-spark - 展开多列 SparkSQL
- html - 背景图像未出现在 WordPress 中