首页 > 解决方案 > 如何通过触发“阅读更多”按钮来抓取数据

问题描述

我正在尝试使用 Python 中的 BeautifulSoup从https://www.mouthshut.com/product-reviews/ICICI-Lombard-Auto-Insurance-reviews-925641018中抓取评论。

实际上评论内容有一个“阅读更多...”按钮。如何触发该按钮以获取全部内容?

我发现单击按钮时会触发 XHR 请求。如何使用 python 模拟它?

此外,在检查了“阅读更多...”按钮后,我得到了这个:

<a style="cursor:pointer" onclick="bindreviewcontent('2836986',this,false,'I found this review of ICICI Lombard Auto Insurance pretty useful',925641018,'.jpg','I found this review of ICICI Lombard Auto Insurance pretty useful %23WriteShareWin','https://www.mouthshut.com/review/ICICI-Lombard-Auto-Insurance-review-rmlrrturotn','ICICI Lombard Auto Insurance',' 1/5','rmlrrturotn');">Read More</a>

如何使用 python 触发 onclick 事件?

标签: pythonweb-scrapingbeautifulsoup

解决方案


提取带有评分和链接的所有评论

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd


def add_reviews(s, soup, results):
    for review in soup.select('.review-article'):
        info = review.select_one('a')
        identifier = review.select_one('[reviewid]')['reviewid']
        data['reviewid'] = identifier
        title = info.text
        link = info['href']
        rating = len(review.select('.rated-star'))
        r = s.post('https://www.mouthshut.com/review/CorporateResponse.ashx', data)
        soup2 = bs(r.content, 'lxml')
        review = ' '.join([i.text for i in soup2.select('p')])
        row = [title, link, rating, review]
        results.append(row)

url = 'https://www.mouthshut.com/product-reviews/ICICI-Lombard-Auto-Insurance-reviews-925641018-page-{}'
data = {'type': 'review', 'reviewid': '', 'catid': '925641018', 'corp': 'false', 'catname': ''}
results = []

with requests.Session() as s:
    r = s.get('https://www.mouthshut.com/product-reviews/ICICI-Lombard-Auto-Insurance-reviews-925641018')
    soup = bs(r.content, 'lxml')
    pages = int(soup.select('#spnPaging .btn-link')[-1].text)
    add_reviews(s, soup, results)
    if pages > 1:
        for page in range(2, pages + 1):
            r = s.get(url.format(page))
            soup = bs(r.content, 'lxml')
            add_reviews(s, soup, results)

df = pd.DataFrame(results, columns = ['Title', 'Link', 'Rating', 'Review'])
print(df)      

推荐阅读