首页 > 解决方案 > 如何使用 Python 请求和 JSON 从基于 Java 的网页中抓取数据?

问题描述

我是 Python 新手。

我正在尝试从此页面中提取一般信息:https ://www.sunnxt.com/tamil-movie/detail/8168,例如主页上显示的电影名称、年份、语言。

我尝试使用此代码没有成功,因为没有生成完整的 html 页面。

url = 'https://www.sunnxt.com/telugu-movie/detail/31257'

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36',

}

data = requests.get(url, headers=headers)
data = BeautifulSoup(data.content, 'html.parser').prettify()

标签: python

解决方案


页面源中有一个脚本元素。
此元素包含您要查找的数据。
您所要做的就是从那里获取数据。

<script type="application/ld+json">
        {
            "@context": "https://schema.org",
            "@type": "VideoObject",
            "name" : "Yaaradi Nee Mohini",
            "description": "Vasu falls in love with Keerthi.When he expresses his feelings to her,she rejects him saying that her marriage has been fixed.Later,he learns that she is about to marry his close friend Cheenu. Vasu&#039;s father meets Keerthi to talk about his son, but she insults him in front of her coworkers. The very same night, Vasu&#039;s father dies and Vasu is all alone except for his close friends Cheenu and Ganesh.  Cheenu forces Vasu that he should come with him to native for his wedding with Keerthi. Will Vasu be able to hold himself together while Keerthi gets married to Cheenu?",
            "url": "https://www.sunnxt.com/tamil-movie/detail/8168",
            "embedUrl": "https://www.sunnxt.com/tamil-movie/detail/8168",
            "contentUrl": "https://www.sunnxt.com/tamil-movie/detail/8168",
            "uploadDate" : "2017-03-31T00:00:00.000Z",
            "image": ["/images/logo.png", "https://sund-images.sunnxt.com/8168/500x750_ccd804fa-f7df-4ddf-8be7-c7752ea75bf0.jpg"],
            "thumbnailUrl": "https://sund-images.sunnxt.com/8168/500x750_ccd804fa-f7df-4ddf-8be7-c7752ea75bf0.jpg",
            "duration":"P0DT2H38M0S",
            "requiresSubscription": {
                "@type": "MediaSubscription",
                "name": "SUNNXT guest user",
                "authenticator": {
                    "@type": "Organization",
                    "name": "SUNNXT"
                }
            }
        }
    </script>

推荐阅读