首页 > 解决方案 > Webscraping Analytics Vidhya 以获取课程及其名称和评论总数

问题描述

我已经抓取了分析 vidhya 网站以获取他们的课程、课程名称和课程的总评论。获得他们的课程没有问题,但是我无法抓取/获取课程名称及其总评论。

这是我的代码:


    import requests
    from bs4 import BeautifulSoup

    for page in range(1,5):
        url = "https://courses.analyticsvidhya.com/collections?category=courses&page="+str(page)
        page_request = requests.get(url)
        data = page_request.content
        soup = BeautifulSoup(data,"html.parser")
        for courses in soup.find_all('div', {'class': 'collections__product-cards collections__product-cards___0b9ab'}):
            for course_name in soup.find_all('ul', {'class': 'products__list'}):
                for names in soup.find_all('li', {'class': 'products__list-item'}):
                    for divs in soup.find_all('div', {'class':'course-card__body'}):
                        for revs in soup.find_all('div', {'class': 'course-card__reviews'}):
                            reviews = soup.find('span', {'class': 'review__stars-count'})
                    title = soup.find('h3')
                    review = reviews.text
                    course_title = title.text
                    print(course_title + " "+str(review) +" "+ "https://courses.analyticsvidhya.com"+ names.find('a')['href'])

运行这个 python 脚本时的问题是它一直给出相同的“course_title”(课程名称)以及评论。

标签: pythonweb-scrapingbeautifulsoup

解决方案


import requests
from bs4 import BeautifulSoup

for page in range(1,6):
    url = "https://courses.analyticsvidhya.com/collections?category=courses&page="+str(page)
    page_request = requests.get(url)
    data = page_request.content
    soup = BeautifulSoup(data,"html.parser")
    for courses in soup.find_all('div', {'class': 'collections__product-cards collections__product-cards___0b9ab'}):
        for names in courses.find_all('li', {'class': 'products__list-item'}):
             for divs in names.find_all('div', {'class':'course-card__body'}):
                    title = divs.find_all('h3')
                    for revs in divs.find_all('div', {'class': 'course-card__reviews'}):
                        rev=revs.find_all('span', {'class': 'review__stars-count'})
                    for i,j in zip(title,rev):
                        title =i.text
                        review=j.text
                        print(title + " "+str(review) +" "+ "https://courses.analyticsvidhya.com"+ names.find('a')['href'])
                            
                                          
                
 

我对代码做了一些编辑,现在它可以抓取课程名称、评论内容和链接。 在此处输入图像描述


推荐阅读