首页 > 解决方案 > 在 Google 学术搜索结果中单击“显示更多”按钮

问题描述

我正在尝试抓取一个谷歌学者页面,但我只能获得显示的前 20 个结果。我正在尝试使用 selenium 单击“显示更多”,以便获得其余结果。这是我所拥有的,但是,它不起作用(我将 URL 存储在变量中):

driver = webdriver.Chrome(executable_path ="/Applications/chromedriver84")
driver.get(url)
time.sleep(5)
element = driver.find_element_by_tag_name('button')
element.click()

有什么建议么?提前致谢。

标签: pythonseleniumweb-scraping

解决方案


您可以将分页参数传递给请求 url。

pagesize- 参数定义要返回的结果数。(例如,20(默认)返​​回 20 个结果,40 返回 40 个结果,等等)。返回的最大结果数为 100。

cstart- 参数定义结果偏移量。它跳过给定数量的结果。它用于分页。(例如,0(默认)是结果的第一页,20 是结果的第二页,40 是结果的第三页,等等)。

因此,接下来 100 个结果的 URL 应如下所示:

https://scholar.google.com/citations?user=VjJm3zYAAAAJ&hl=en&cstart=100&pagesize=100

您也可以使用 SerpApi 等第三方解决方案为您执行此操作。这是一个免费试用的付费 API。

示例 python 代码(也可在其他库中获得):

from serpapi import GoogleSearch

params = {
  "api_key": "secret_api_key",
  "engine": "google_scholar_author",
  "hl": "en",
  "author_id": "VjJm3zYAAAAJ",
  "num": "100",
  "start": "100"
}

search = GoogleSearch(params)
results = search.get_dict()

示例 JSON 输出:

"articles": [
  {
    "title": "Comparison of meta-heuristic algorithms for clustering rectangles",
    "link": "https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VjJm3zYAAAAJ&cstart=100&pagesize=100&citation_for_view=VjJm3zYAAAAJ:-f6ydRqryjwC",
    "citation_id": "VjJm3zYAAAAJ:-f6ydRqryjwC",
    "authors": "E Burke, G Kendall",
    "publication": "Computers and Industrial Engineering 37 (1), 383-386, 1999",
    "cited_by": {
      "value": 39,
      "link": "https://scholar.google.com/scholar?oi=bibs&hl=en&cites=17215057896442932540",
      "serpapi_link": "https://serpapi.com/search.json?cites=17215057896442932540&engine=google_scholar&hl=en",
      "cites_id": "17215057896442932540"
    },
    "year": "1999"
  },
  {
    "title": "Geometrical insights into the dendritic cell algorithm",
    "link": "https://scholar.google.com/citations?view_op=view_citation&hl=en&user=VjJm3zYAAAAJ&cstart=100&pagesize=100&citation_for_view=VjJm3zYAAAAJ:kz9GbA2Ns4gC",
    "citation_id": "VjJm3zYAAAAJ:kz9GbA2Ns4gC",
    "authors": "T Stibor, R Oates, G Kendall, JM Garibaldi",
    "publication": "Proceedings of the 11th Annual conference on Genetic and evolutionary …, 2009",
    "cited_by": {
      "value": 38,
      "link": "https://scholar.google.com/scholar?oi=bibs&hl=en&cites=641480971970954670",
      "serpapi_link": "https://serpapi.com/search.json?cites=641480971970954670&engine=google_scholar&hl=en",
      "cites_id": "641480971970954670"
    },
    "year": "2009"
  },
  ...
]

查看文档以获取更多详细信息。

免责声明:我在 SerpApi 工作。


推荐阅读