首页 > 解决方案 > 获取访问页面时加载的 xhr 文档

问题描述

我试图获取我们可以在以下站点或其他站点的照片下方看到的元素,等效:

水图像介绍

但我无法从源代码中得到它。它应该使用 javascript 脚本动态下载。实际上,它似乎在 xhr 文档中:

水图像介绍

那么如何获取访问页面时下载的xhr文档呢?

我试过:

url = "https://www.nosetime.com/xiangshui/350870-oulong-atelier-cologne-oolang-infini.html"

r = requests.post(url, headers=headers)
data = r.json()

print(data)

Pero me develve:

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-8-e72156ddb336> in <module>()
      2 
      3 r = requests.post(url, headers=headers)
----> 4 data = r.json()
      5 
      6 print(data)

3 frames
/usr/lib/python3.6/json/decoder.py in raw_decode(self, s, idx)
    355             obj, end = self.scan_once(s, idx)
    356         except StopIteration as err:
--> 357             raise JSONDecodeError("Expecting value", s, err.value) from None
    358         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

标签: python-3.xajaxweb-scrapingxmlhttprequest

解决方案


只需添加正确的标题,您就有了数据。

import requests


headers = {
    "referer": "https://www.nosetime.com/xiangshui/350870-oulong-atelier-cologne-oolang-infini.html",
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36",
}
response = requests.get("https://www.nosetime.com/app/item.php?id=350870", headers=headers).json()

print(response["id"], response["isscore"], response["brandid"])

出于某种原因,我无法粘贴整个JSON输出,因为我SO认为这是垃圾邮件... oO 无论如何,这应该会让您得到JSON响应。

这打印:

350870 8.6 10091761

编辑:

如果您有更多产品,您可以简单地查看产品 URL 并从JSON您需要的内容中提取。例如,

import requests

product_urls = [
    "https://www.nosetime.com/xiangshui/947895-oulong-xuecheng-atelier-cologne-orange.html",
    "https://www.nosetime.com/xiangshui/705357-pomelo-paradis.html",
    "https://www.nosetime.com/xiangshui/592260-cl-mentine-california.html",
    "https://www.nosetime.com/xiangshui/612353-oulong-atelier-cologne-trefle.html",
    "https://www.nosetime.com/xiangshui/911317-oulong-nimingmeigui-atelier-cologne.html",
]


for product_url in product_urls:
    headers = {
        "referer": product_url,
        "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36",
    }
    product_id = product_url.split("/")[-1].split("-")[0]
    response = requests.get(
        f"https://www.nosetime.com/app/item.php?id={product_id}",
        headers=headers,
    ).json()
    print(f"Product name: {response['enname']} | Rating: {response['isscore']}")

输出:

Product name: Atelier Cologne Orange Sanguine, 2010 | Rating: 8.9
Product name: Atelier Cologne Pomelo Paradis, 2015 | Rating: 8.8
Product name: Atelier Cologne Clémentine California, 2016 | Rating: 8.6
Product name: Atelier Cologne Trefle Pur, 2010 | Rating: 8.6
Product name: Atelier Cologne Rose Anonyme, 2012 | Rating: 7.7

推荐阅读