首页 > 解决方案 > BeautifulSoup-访问更多评论

问题描述

我正在尝试从 IMDB 电影链接中抓取评论并提取用户名进行评论,我只获得 25 个用户名,因为这就是页面显示的内容,直到您按“显示更多”。我需要一种访问所有评论的方法,除了使用 Selenium 之外,还有其他方法可以做到这一点,因为由于某种原因,我在尝试导入它时遇到 SSL 证书错误。

import requests
from time import sleep
url='https://www.imdb.com/title/tt0068646/reviews?ref_=tt_urv'
response= requests.get(url,verify=False)
response
import bs4
soup=bs4.BeautifulSoup(response.content, 'html5lib')
name=soup.find_all('span', class_='display-name-link')
len(name)

标签: pythonseleniumbeautifulsouprequest

解决方案


要抓取所有用户名(总共 4041 个),发送GET请求以模拟单击按钮:

import requests
from bs4 import BeautifulSoup

main_url = "https://www.imdb.com/title/tt0068646/reviews?ref_=tt_urv"
ajax_url = "https://www.imdb.com/title/tt0068646/reviews/_ajax?ref_=undefined&paginationKey={}"
soup = BeautifulSoup(requests.get(main_url).content, "html5lib")

while True:
    for tag in soup.select(".display-name-link"):
        print(tag.text)
    print("-" * 30)

    button = soup.select_one(".load-more-data")
    if not button:
        break

    key = button["data-key"]
    soup = BeautifulSoup(requests.get(ajax_url.format(key)).content, "html5lib")

输出:

CalRhys
gogoschka-1
SJ_1
andrewburgereviews
alexkolokotronis
MR_Heraclius
b-a-h TNT-6
danielfeerst
mattrochman
Godz365
winnantonio
Trevizolga
DaveDiggler
ks4
...
... All the way until
Steven Bray
Castor-5
BLDJ
pinky67
dean keaton
rejoefrankel
Timothy

推荐阅读