首页 > 解决方案 > 尝试对电影网站观看列表进行一次性网络抓取 (Mubi)

问题描述

截屏

尝试使用我在这里找到的代码来处理类似的情况,但只是得到一个空列表的输出。

# Import Module
from bs4 import BeautifulSoup
import requests

# Website URL
url = 'https://mubi.com/users/9167878/watchlist'

r = requests.get(url)

soup = BeautifulSoup(r.content, "lxml")

g_data = soup.find_all('h3', {"class": "film-title", 'lang': 'en'})

print(g_data)

标签: pythonweb-scrapingbeautifulsoup

解决方案


数据通过 JavaScript 从外部源加载。您可以使用此示例如何使用requests/json模块加载它:

import json
import requests


url = "https://mubi.com/users/9167878/watchlist"
api_url = (
    "https://mubi.com/services/api/wishes?user_id={user_id}&page=1&per_page=24"
)
user_id = url.split("/")[-2]

data = requests.get(api_url.format(user_id=user_id)).json()

# uncomment this to print all data
# print(json.dumps(data, indent=4))

for d in data:
    print("{:<50} {}".format(d["film"]["title"], d["film"]["canonical_url"]))

印刷:

An Affair of Love                                  http://mubi.com/films/an-affair-of-love
I Am Self-Sufficient                               http://mubi.com/films/i-am-self-sufficient
Le Donne della Vucciria                            http://mubi.com/films/le-donne-della-vucciria
Golden Dreams                                      http://mubi.com/films/golden-dreams
Ecce bombo                                         http://mubi.com/films/ecce-bombo
The Solitude of Prime Numbers                      http://mubi.com/films/the-solitude-of-prime-numbers
De Djess                                           http://mubi.com/films/de-djess
Salon Kitty                                        http://mubi.com/films/salon-kitty
The Believer's Heaven                              http://mubi.com/films/the-believer-s-heaven
Hot Thrills and Warm Chills                        http://mubi.com/films/hot-thrills-and-warm-chills
Shanty Tramp                                       http://mubi.com/films/shanty-tramp
Emerald Cities                                     http://mubi.com/films/emerald-cities
Wild Guitar                                        http://mubi.com/films/wild-guitar
The Burning Hell                                   http://mubi.com/films/the-burning-hell
Satan in High Heels                                http://mubi.com/films/satan-in-high-heels
The Nest of the Cuckoo Birds                       http://mubi.com/films/the-nest-of-the-cuckoo-birds
Guilty Bystander                                   http://mubi.com/films/guilty-bystander
Orgy of the Dead                                   http://mubi.com/films/orgy-of-the-dead
Spring Night, Summer Night                         http://mubi.com/films/spring-night-summer-night
The Maidens of Fetish Street                       http://mubi.com/films/the-girls-on-f-street
Night Tide                                         http://mubi.com/films/night-tide
The Guilty                                         http://mubi.com/films/the-guilty-2018
I Am Not a Witch                                   http://mubi.com/films/i-am-not-a-witch
The Wayward Girl                                   http://mubi.com/films/the-wayward-girl

推荐阅读