python - 如何使用 Python 请求和 JSON 从基于 Java 的网页中抓取数据?
问题描述
我是 Python 新手。
我正在尝试从此页面中提取一般信息:https ://www.sunnxt.com/tamil-movie/detail/8168,例如主页上显示的电影名称、年份、语言。
我尝试使用此代码没有成功,因为没有生成完整的 html 页面。
url = 'https://www.sunnxt.com/telugu-movie/detail/31257'
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36',
}
data = requests.get(url, headers=headers)
data = BeautifulSoup(data.content, 'html.parser').prettify()
解决方案
页面源中有一个脚本元素。
此元素包含您要查找的数据。
您所要做的就是从那里获取数据。
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "VideoObject",
"name" : "Yaaradi Nee Mohini",
"description": "Vasu falls in love with Keerthi.When he expresses his feelings to her,she rejects him saying that her marriage has been fixed.Later,he learns that she is about to marry his close friend Cheenu. Vasu's father meets Keerthi to talk about his son, but she insults him in front of her coworkers. The very same night, Vasu's father dies and Vasu is all alone except for his close friends Cheenu and Ganesh. Cheenu forces Vasu that he should come with him to native for his wedding with Keerthi. Will Vasu be able to hold himself together while Keerthi gets married to Cheenu?",
"url": "https://www.sunnxt.com/tamil-movie/detail/8168",
"embedUrl": "https://www.sunnxt.com/tamil-movie/detail/8168",
"contentUrl": "https://www.sunnxt.com/tamil-movie/detail/8168",
"uploadDate" : "2017-03-31T00:00:00.000Z",
"image": ["/images/logo.png", "https://sund-images.sunnxt.com/8168/500x750_ccd804fa-f7df-4ddf-8be7-c7752ea75bf0.jpg"],
"thumbnailUrl": "https://sund-images.sunnxt.com/8168/500x750_ccd804fa-f7df-4ddf-8be7-c7752ea75bf0.jpg",
"duration":"P0DT2H38M0S",
"requiresSubscription": {
"@type": "MediaSubscription",
"name": "SUNNXT guest user",
"authenticator": {
"@type": "Organization",
"name": "SUNNXT"
}
}
}
</script>
推荐阅读
- java - 有谁知道如何使用@ControllerAdvice 处理ESAPI 中的入侵异常和验证异常我想使用@ControllerAdvice
- javascript - 按具有多个属性的预定义顺序排序
- reactjs - 部署 react-app 后显示自述页面
- security - 内容安全策略 (CSP) 隔离是否可行?
- c - 我应该如何将其他函数的值调用到另一个函数的条件语句?
- typescript - TS1108 (TS) 'return' 语句只能在函数体内使用 - 淘汰赛 js、durandal、typescript
- google-cloud-platform - Gitlab Cloud 运行部署成功但作业失败
- flutter - 变量更改属性在飞镖编程中不起作用
- php - Laravel:如何将标头添加到 Mockery 模拟响应类?
- git - SSH GitLab pull 总是要求输入密码