首页 > 解决方案 > 如何遍历imdb影评?

问题描述

我想从 imdb 下载一些电影评论,这样我就不能将这些评论用于我的 LDA 模型。(对于我的学校)

但是默认的评论网站只包含 25 条评论(例如https://www.imdb.com/title/tt0111161/reviews/?ref_=tt_ql_urv)如果我想要更多,我需要点击网站底部的“加载更多”按钮,这又给了我 25 条评论。

我不知道如何在python中自动化这个,我不能去 * https://www.imdb.com/title/tt0111161/reviews/?ref_=tt_ql_urv*```/2```或添加范围?page=2

如何使用python自动遍历imdb评论站点的页面?

还有,这是故意弄得这么难吗?

标签: pythonweb-scraping

解决方案


当我在(选项卡:,过滤器:)中单击Load More时,我看到了类似的链接DevToolsCrome/FirefoxNetworkXHR

https://www.imdb.com/title/tt0111161/reviews/_ajax?ref_=undefined&paginationKey=g4xolermtiqhejcxxxgs753i36t52q343mpt34pjada6qpye4w6qtalmfyy7wfxcwfzuwsyh

它有paginationKey=g4x...

我在 HTML 中看到了类似的东西<div ... data-key="g4x..."- 所以data-key我使用它创建链接来获取下一页。


示例代码。

首先,我从普通 URL 获取 HTML,然后从评论中获取标题。接下来,我获取data-key并创建 URL 以获取新评论。我在for-loop 中重复它以获得 3 页,但while True如果仍然存在,你可以使用循环并重复它data-key

import requests
from bs4 import BeautifulSoup 

s = requests.Session()
#s.headers['User-Agent'] = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:93.0) Gecko/20100101 Firefox/93.0'

# get first/full page

url = 'https://www.imdb.com/title/tt0111161/reviews/?ref_=tt_ql_urv'

r = s.get(url)
soup = BeautifulSoup(r.text, 'html.parser')

items = soup.find_all('a', {'class': 'title'})
for number, title in enumerate(items, 1):
    print(number, '>', title.text.strip())
    
# get next page(s)

for _ in range(3):

    div = soup.find('div', {'data-key': True})
    print('---', div['data-key'], '---')

    url = 'https://www.imdb.com/title/tt0111161/reviews/_ajax'
    
    payload = {
        'ref_': 'tt_ql_urv',
        'paginationKey': div['data-key']
    }

    #headers = {'X-Requested-With': 'XMLHttpRequest'}
    
    r = s.get(url, params=payload) #, headers=headers)
    soup = BeautifulSoup(r.text, 'html.parser')
    
    items = soup.find_all('a', {'class': 'title'})
    for number, title in enumerate(items, 1):
        print(number, '>', title.text.strip())
    

结果:

1 > Enthralling, fantastic, intriguing, truly remarkable!
2 > "I Had To Go To Prison To Learn To Be A Crook"
3 > Masterpiece
4 > All-time prison film classic
5 > Freeman gives it depth
6 > impressive
7 > Simply a great story that is moving and uplifting
8 > An incredible movie. One that lives with you.
9 > "I'm a convicted murderer who provides sound financial planning".
10 > IMDb and the Greatest Film of All Time
11 > never give up hope
12 > The Shawshank Redemption
13 > Brutal Anti-Bible Bigotry Prevails Again
14 > Time and Pressure.
15 > A classic
16 > An extraordinary and unforgettable film about a bank veep who is convicted of murders and sentenced to the toughest prison
17 > A genre picture, but a satisfying one...
18 > Why it is ranked so highly.
19 > Exceptional
20 > Shawshank Redemption- Prison Film is Redeemed by Quality ****
21 > A Classic Film On Hope And Redemption
22 > Compelling masterpiece
23 > Relentless Storytelling
24 > Some birds aren't meant to be caged.
25 > Good , But It Is Overrated By Some
--- g4xolermtiqhejcxxxgs753i36t52q343mpt34pjada6qpye4w6qtalmfyy7wfxcwfzuwsyh ---
1 > Stephen King's prison tale with a happy ending...
2 > Breaking Big Rocks Into Little Rocks
3 > Over Rated
4 > Terrific stuff!
5 > Highly Overrated But Still Good
6 > Superb
7 > Beautiful movie
8 > Tedious, overlong, with "hope" being the second word spoken in just about every sentence... who cares?
9 > Excellent Stephen King adaptation; flawless Robbins & Freeman
10 > Good for the spirit
11 > Entertaining Prison Movie Isn't Nearly as Good as Its Rabid Fan Base Would Lead You to Believe
12 > Observations...
13 > Why can't they make films like this anymore?
14 > Shawshank Redemption Comes Out Clean
15 > Hope Springs Eternal:Rita Hayworth And The Shawshank Redemption.
16 > Redeeming.
17 > You don't understand! I'm not supposed to be here!
18 > A Story Of Hope & Resilence
19 > Salvation lies within....
20 > Pretty good movie...one of those that you do not really need to watch from beginning to end.
21 > A film of Eloquence
22 > A great film of a helping hand leading to end-around justice
23 > about freedom
24 > Reputation notwithstanding, this is powerful stuff
25 > The best film ever made!
--- g4uorermtiqhejcxxxgs753i36t52q343eod34plapeoqp27z6b2lhdevccn5wyrz2vmgufh ---
1 > A Sort of Secular Redemption
2 > In virus times, we need this hope.
3 > The placement isn't an exaggeration
4 > A true story of friendship and hard times
5 > Escape from Shawshank
6 > Great Story Telling
7 > moving and emotionally impactful(if you liked "The Green Mile" you will like this movie)
8 > Super Good - Highly Recommended
9 > I can see why this is rated Number 1 on IMDb.

# ...

推荐阅读