python - 尝试对电影网站观看列表进行一次性网络抓取 (Mubi)
问题描述
尝试使用我在这里找到的代码来处理类似的情况,但只是得到一个空列表的输出。
# Import Module
from bs4 import BeautifulSoup
import requests
# Website URL
url = 'https://mubi.com/users/9167878/watchlist'
r = requests.get(url)
soup = BeautifulSoup(r.content, "lxml")
g_data = soup.find_all('h3', {"class": "film-title", 'lang': 'en'})
print(g_data)
解决方案
数据通过 JavaScript 从外部源加载。您可以使用此示例如何使用requests
/json
模块加载它:
import json
import requests
url = "https://mubi.com/users/9167878/watchlist"
api_url = (
"https://mubi.com/services/api/wishes?user_id={user_id}&page=1&per_page=24"
)
user_id = url.split("/")[-2]
data = requests.get(api_url.format(user_id=user_id)).json()
# uncomment this to print all data
# print(json.dumps(data, indent=4))
for d in data:
print("{:<50} {}".format(d["film"]["title"], d["film"]["canonical_url"]))
印刷:
An Affair of Love http://mubi.com/films/an-affair-of-love
I Am Self-Sufficient http://mubi.com/films/i-am-self-sufficient
Le Donne della Vucciria http://mubi.com/films/le-donne-della-vucciria
Golden Dreams http://mubi.com/films/golden-dreams
Ecce bombo http://mubi.com/films/ecce-bombo
The Solitude of Prime Numbers http://mubi.com/films/the-solitude-of-prime-numbers
De Djess http://mubi.com/films/de-djess
Salon Kitty http://mubi.com/films/salon-kitty
The Believer's Heaven http://mubi.com/films/the-believer-s-heaven
Hot Thrills and Warm Chills http://mubi.com/films/hot-thrills-and-warm-chills
Shanty Tramp http://mubi.com/films/shanty-tramp
Emerald Cities http://mubi.com/films/emerald-cities
Wild Guitar http://mubi.com/films/wild-guitar
The Burning Hell http://mubi.com/films/the-burning-hell
Satan in High Heels http://mubi.com/films/satan-in-high-heels
The Nest of the Cuckoo Birds http://mubi.com/films/the-nest-of-the-cuckoo-birds
Guilty Bystander http://mubi.com/films/guilty-bystander
Orgy of the Dead http://mubi.com/films/orgy-of-the-dead
Spring Night, Summer Night http://mubi.com/films/spring-night-summer-night
The Maidens of Fetish Street http://mubi.com/films/the-girls-on-f-street
Night Tide http://mubi.com/films/night-tide
The Guilty http://mubi.com/films/the-guilty-2018
I Am Not a Witch http://mubi.com/films/i-am-not-a-witch
The Wayward Girl http://mubi.com/films/the-wayward-girl
推荐阅读
- ios - 每次清理和构建代码时,代码不会编译几乎新的错误
- django - Django FileField 和媒体配置
- xaml - 覆盖 Windows 10 UWP 中的默认滚动条行为
- mysql - Mysql join & filter
- azure - Azure 系统 uuid 在 VM 生命周期内是否不可变?
- apache-nifi - Apache-NiFi putSQS 处理器失败 - 在请求中找不到 MessageGroupId 参数
- android - 请告诉我您需要使用什么类型的数据?从 Firebase 输出数据,我得到相同的结果((
- c# - 远程验证,找不到远程验证的 URL
- ms-access - 无法使用 DBeaver 和默认的 UCanAccess-5.0.0 驱动程序连接到 MS Access mdb 文件
- sql - 如何从两个表的 UNION 中清除重复值?