python - 检索谷歌搜索链接的 Xpath
问题描述
我正在编写一个 python selenium 脚本来尝试在谷歌搜索中提取 LinkedIn 个人资料的 URL 链接,但我在缩小我的 XPath 以仅返回谷歌上的搜索结果链接时遇到问题。
linkedin_urls = driver.find_elements_by_xpath('//div[@class="yuRUbf"]//a[@href]')
for linkedin_url in linkedin_urls:
url = linkedin_url.get_attribute("href")
print(url)
driver.get(url)
sleep(5)
linkedin_urls 的结果给了我
https://uk.linkedin.com/in/roxana-andreea-popescu
https://uk.linkedin.com/in/tunjijabitta
https://www.google.com/search?source=hp&ei=bxjhX4uGC4_ykgXl9pu4Bw&q=site%3Alinkedin.com%2Fin%2F+AND+%22Software+Developer%22+AND+%22London%22&oq=site%3Alinkedin.com%2Fin%2F+AND+%22Software+Developer%22+AND+%22London%22&gs_lcp=CgZwc3ktYWIQDFDMZFjhZmCwZ2gAcAB4AIABLogBsAGSAQE0mAEAoAEBqgEHZ3dzLXdpeg&sclient=psy-ab&ved=0ahUKEwjL-dn4huDtAhUPuaQKHWX7BncQ4dUDCA0#
https://www.google.com/search?q=related:https://uk.linkedin.com/in/tunjijabitta&sa=X&ved=2ahUKEwji3qP_huDtAhWAZxUIHTyfAO4QHzABegQIBhAH
https://uk.linkedin.com/in/janomer
https://uk.linkedin.com/in/josephcoker
https://uk.linkedin.com/in/sebemin
https://uk.linkedin.com/in/vicki-marshall-b7433827
https://www.google.com/search?source=hp&ei=bxjhX4uGC4_ykgXl9pu4Bw&q=site%3Alinkedin.com%2Fin%2F+AND+%22Software+Developer%22+AND+%22London%22&oq=site%3Alinkedin.com%2Fin%2F+AND+%22Software+Developer%22+AND+%22London%22&gs_lcp=CgZwc3ktYWIQDFDMZFjhZmCwZ2gAcAB4AIABLogBsAGSAQE0mAEAoAEBqgEHZ3dzLXdpeg&sclient=psy-ab&ved=0ahUKEwjL-dn4huDtAhUPuaQKHWX7BncQ4dUDCA0#
https://www.google.com/search?q=related:https://uk.linkedin.com/in/vicki-marshall-b7433827&sa=X&ved=2ahUKEwji3qP_huDtAhWAZxUIHTyfAO4QHzAFegQIARAH
https://uk.linkedin.com/in/andreibodnar
https://www.google.com/search?q=related:https://uk.linkedin.com/in/andreibodnar&sa=X&ved=2ahUKEwji3qP_huDtAhWAZxUIHTyfAO4QHzAGegQIBxAH
https://uk.linkedin.com/in/dmrlawson
https://uk.linkedin.com/in/jack-gilbert-541a251b
https://www.google.com/search?source=hp&ei=bxjhX4uGC4_ykgXl9pu4Bw&q=site%3Alinkedin.com%2Fin%2F+AND+%22Software+Developer%22+AND+%22London%22&oq=site%3Alinkedin.com%2Fin%2F+AND+%22Software+Developer%22+AND+%22London%22&gs_lcp=CgZwc3ktYWIQDFDMZFjhZmCwZ2gAcAB4AIABLogBsAGSAQE0mAEAoAEBqgEHZ3dzLXdpeg&sclient=psy-ab&ved=0ahUKEwjL-dn4huDtAhUPuaQKHWX7BncQ4dUDCA0#
https://www.google.com/search?q=related:https://uk.linkedin.com/in/jack-gilbert-541a251b&sa=X&ved=2ahUKEwji3qP_huDtAhWAZxUIHTyfAO4QHzAIegQICxAH
https://uk.linkedin.com/in/eren-batu-999068185
我正在尝试找到一种方法将搜索范围缩小到仅 LinkedIn 结果
解决方案
要将搜索限制在您需要诱导WebDriverWait的LinkedIn 结果visibility_of_all_elements_located()
,您可以使用以下任一Locator Strategies:
使用
CSS_SELECTOR
:print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.yuRUbf a[href^='https://uk.linkedin.com/in']")))])
使用
XPATH
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class="yuRUbf"]//a[starts-with(@href, 'https://uk.linkedin.com/in')]")))])
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
推荐阅读
- javascript - 添加到 ngmodule 但仍然得到
- php - 视频未使用改造从 android 上传到 PHP 中的服务器
- unix - 在 unix 中使用 head 和 tail 命令从文件中提取项目
- c - 如何从用户那里获取两个二进制数和操作并给出答案
- javascript - 浏览器历史记录如何在加载时不跟踪网页 url
- sql - 修改 SQL Server 表中的 XML 数据
- assembly - 8086 如何反转寄存器中的一个字节
- django - 使用自定义应用程序在 django 管理面板中调用 ajax
- c# - 有人可以帮我理解为什么我的 TextBox 的占位符在选择 RadioButton 时不会改变吗?
- c# - 哈希集
作为列表框的数据源