首页 > 解决方案 > 抓取具有链接 javascript:void() 的页面上的内容

问题描述

我想爬取https://www.gotouniversity.com/course/index的前十页。到目前为止,我已经能够掌握第一页上的内容。

from selenium import webdriver
driver = webdriver.Chrome(executable_path='/Users/xx/Desktop/chromedriver')
driver.get('https://www.gotouniversity.com/course/index')
university_name = driver.find_elements_by_class_name("university-name")
university_name = [link.text for link in university_name]

print(university_name)

['Loyola University Chicago',
 'Queens University',
  ...
 'Yale University']

页面的链接是javascript:void(),所以不知道如何一一掌握每一页的内容。


<div class="pagination"><div aria-live="polite" role="status" style="float:left; height:14px; padding:8px">Showing 1 to 20 of 143981 entries</div><div style="float:right;"><ul class="pagination" id="pagin_count"><li class="active" p="1"><a>1</a></li><li p="2"><a href="javascript:void()" onclick="pagingcustom(2);">2</a></li><li p="3"><a href="javascript:void()" onclick="pagingcustom(3);">3</a></li><li p="4"><a href="javascript:void()" onclick="pagingcustom(4);">4</a></li><li p="5"><a href="javascript:void()" onclick="pagingcustom(5);">5</a></li><li p="6"><a href="javascript:void()" onclick="pagingcustom(6);">6</a></li><li p="7"><a href="javascript:void()" onclick="pagingcustom(7);">7</a></li><li p="8"><a href="javascript:void()" onclick="pagingcustom(8);">8</a></li><li p="9"><a href="javascript:void()" onclick="pagingcustom(9);">9</a></li><li p="10"><a href="javascript:void()" onclick="pagingcustom(10);">10</a></li><li p="1"><a href="javascript:void()" onclick="pagingcustom(1);">Next</a></li></ul></div></div>
</div>
<script>
function fn_advcount(id){
    $.ajax({
            url: 'https://www.gotouniversity.com/site/advertisement-count',
            data: { id : id },
            success: function(result){
    }});
  }
</script>

我要获取的相关内容

<a href="/university/loyola-university-chicago" target="_blank" title="University">
<p class="university-name" title="Loyola University Chicago">Loyola University Chicago</p>
</a>

我已经阅读了一些相关问题,但我仍然无法找到解决方案


我也测试bs4过它可以抓取第一页上的内容

import bs4
import requests
bowl = requests.get('https://www.gotouniversity.com/course/index') 
soup = bs4.BeautifulSoup(bowl.text, 'html.parser')
UniversityName = [i.text for i in soup.find_all('p', attrs={'class': 'university-name'})]

标签: pythonseleniumbeautifulsoupweb-crawler

解决方案


使用beautifulsoup,这将打印大学名称和链接的前 10 页:

import requests
from bs4 import BeautifulSoup

url = 'https://www.gotouniversity.com/course/index'

params = {'page': 1}

for page in range(1, 11):
    print('Page no.{}...'.format(page))
    print('-' * 120)
    print()

    params['page'] = page
    soup = BeautifulSoup( requests.post(url, data=params).text, 'html.parser' )

    for a in soup.select('a[title="University"]'):
        print('{: <60}{}'.format(a.get_text(strip=True), a['href']))

    print()

印刷:

Page no.1...
------------------------------------------------------------------------------------------------------------------------

Loyola University Chicago                                   /university/loyola-university-chicago
Queens University                                           /university/queens-university
University of Wollongong                                    /university/university-of-wollongong
Nanyang Technological University                            /university/nanyang-technological-university
Kaunas University of Technology                             /university/kaunas-university-of-technology
University of Bristol                                       /university/university-of-bristol
University of Victoria                                      /university/university-of-victoria
National University of Singapore NUS                        /university/national-university-of-singapore-nus
Duke University                                             /university/duke-university
Queens University                                           /university/queens-university
New Jersey Institute of Technology                          /university/new-jersey-institute-of-technology
Swinburne University of Technology                          /university/swinburne-university-of-technology
University of Alberta                                       /university/university-of-alberta
Cardiff University                                          /university/cardiff-university
St Clair College                                            /university/st-clair-college
Stanford University                                         /university/stanford-university
McGill University                                           /university/mcgill-university
Arizona State University Tempe                              /university/arizona-state-university-tempe
University of North Carolina Greensboro                     /university/university-of-north-carolina-greensboro
Yale University                                             /university/yale-university

Page no.2...
------------------------------------------------------------------------------------------------------------------------

Cambrian College                                            /university/cambrian-college
Simon Fraser University Burnaby                             /university/simon-fraser-university-burnaby
University of Bologna                                       /university/university-of-bologna
Memorial University of Newfoundland                         /university/memorial-university-of-newfoundland
Centennial College                                          /university/centennial-college
University of Groningen                                     /university/university-of-groningen
Griffith University Gold Coast Campus                       /university/griffith-university-gold-coast-campus
Texas A and M University College Station                    /university/texas-a-and-m-university-college-station
University of Calgary                                       /university/university-of-calgary
University of Melbourne                                     /university/university-of-melbourne
Fanshawe College                                            /university/fanshawe-college
Zurich Swiss Federal Institute of Technology ETH            /university/zurich-swiss-federal-institute-of-technology-eth
Northeastern University                                     /university/northeastern-university
Adelphi University                                          /university/adelphi-university
Heriot Watt University Dubai                                /university/heriot-watt-university-dubai
University of Ottawa                                        /university/university-of-ottawa
University of Regina                                        /university/university-of-regina
University of Regina                                        /university/university-of-regina
Humber College North Campus                                 /university/humber-college-north-campus
Seneca College                                              /university/seneca-college

...and so on.

推荐阅读