python - Beautifulsoup 没有返回网页上的所有文本
问题描述
尝试对网站进行网页抓取,但 Beautifulsoup 仅在查看网页时不会返回所有可见的文本。请看下面的代码:
import requests
from bs4 import BeautifulSoup
f = open("data.txt", "w")
url = "https://www.hiltongrandvacations.com/en/resorts-and-destinations"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html5lib')
f.write(str(soup))
f.close()
例如,以下文本在网页上可见,但 Beautifulsoup 未返回(写入文本文件): Grand Pacific Palisades Resort
我尝试了不同的解析器(html、lxml),但仍然没有得到它。另外,似乎文本不是由 Javascript 生成的,我可能错了。
解决方案
您看到的数据是通过 JavaScript 动态加载的。您可以使用此示例加载数据:
import json
import requests
payload = {"locations":[],"amenities":[],"vacationTypes":[],"page":1,"pageSize":9}
api_url = 'https://www.hiltongrandvacations.com/sitecore/api/ssc/apps/PropertySearch'
data = requests.put(api_url, json=payload).json()
# uncomment this to prin all data:
# print(json.dumps(data, indent=4))
# print some info on screen:
for card in data['Cards']:
print(card['Title'])
print(card['Description'])
print('-' * 80)
印刷:
Sunrise Lodge, a Hilton Grand Vacations Club
Revel in the peak of adventure
--------------------------------------------------------------------------------
The District by Hilton Club
A capital experience in the capital city
--------------------------------------------------------------------------------
The Central at 5th by Hilton Club
At the heart of city life
--------------------------------------------------------------------------------
The Hilton Club – New York
Make a break for the Big Apple.
--------------------------------------------------------------------------------
The Residences by Hilton Club
Wake up in the city that never sleeps.
--------------------------------------------------------------------------------
Grand Pacific Palisades Vacation Resort
A window to the Pacific Ocean.
--------------------------------------------------------------------------------
Carlsbad Seapointe Resort
A quintessentially Californian vacation
--------------------------------------------------------------------------------
Hilton Grand Vacations Chicago Downtown/Magnificent Mile
A sky-high sanctuary amidst the big-city bustle
--------------------------------------------------------------------------------
Hilton Grand Vacations Club at Trump International Hotel Las Vegas
--------------------------------------------------------------------------------
推荐阅读
- python - 使用python解析getEmailActivityUserDetail报告
- spring - 如何跟踪 WebClient POST/GET 请求的进度 - Spring Boot?
- reactjs - Is there a way to force tab unmount in ChakraUI tabs component?
- jira - 在现有 Confluence 应用程序中更改指向 Jira OAuth 的链接
- node.js - 使用“纱线”运行命令但没有它的情况下出现纱线问题
- c++ - 为什么我的程序从 csv 文件 c++ 输出特殊字符
- php - 需要凭据 php twilio
- processing - 如何使这种模式扩大和缩小
- python - Python Pandas:从表中仅获取 3 个第一个元素
- javascript - 像使用 setTimeout 的承诺一样使用 setTimeout 构建异步不起作用