首页 > 解决方案 > 用美汤找不到元素

问题描述

我还在学习用 python 编码。我真的需要帮助才能从这个网站上抓取元素:

https://www.tokopedia.com/craftdale/crossback-apron-hijau-army?src=topads

我想从 Review (Ulasan)容器中获取 Review 数据(Review Time)

在此处输入图像描述

这是来自网站的 HTML

<p disabled="" data-testid="txtDateGivenReviewFilter0" class="css-oals0c-unf-heading e1qvo2ff8">1 bulan lalu</p>

我试图用这段代码获取元素

review = soup.findAll('p',class_='css-oals0c-unf-heading e1qvo2ff8') 

或者

review= soup.findAll('p',id_='txtDateGivenReviewFilter0') 

但结果我只得到空数据 在此处输入图像描述

任何人都可以解决这个问题吗?非常感谢

标签: pythonbeautifulsoup

解决方案


当您分析网站时,网站会进行 ajax 调用以检索网站中的不同信息。为了获取评论信息,它使用 json 有效负载对特定端点进行 ajax 调用。

import requests, json

payload = [{"operationName": "PDPReviewRatingQuery", "variables": {"productId": 353506414}, "query": "query PDPReviewRatingQuery($productId: Int!) {\n  ProductRatingQuery(productId: $productId) {\n    ratingScore\n    totalRating\n    totalRatingWithImage\n    detail {\n      rate\n      totalReviews\n      percentage\n      __typename\n    }\n    __typename\n  }\n}\n"}, {"operationName": "PDPReviewImagesQuery", "variables": {"productID": 353506414, "page": 1}, "query": "query PDPReviewImagesQuery($page: Int, $productID: Int!) {\n  ProductReviewImageListQuery(page: $page, productID: $productID) {\n    detail {\n      reviews {\n        reviewer {\n          fullName\n          profilePicture\n          __typename\n        }\n        reviewId\n        message\n        rating\n        updateTime\n        isReportable\n        __typename\n      }\n      images {\n        imageAttachmentID\n        description\n        uriThumbnail\n        uriLarge\n        reviewID\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n"}, {"operationName": "PDPReviewHelpfulQuery", "variables": {"productID": 353506414}, "query": "query PDPReviewHelpfulQuery($productID: Int!) {\n  ProductMostHelpfulReviewQuery(productId: $productID) {\n    shop {\n      shopId\n      __typename\n    }\n    list {\n      reviewId\n      message\n      productRating\n      reviewCreateTime\n      reviewCreateTimestamp\n      isReportable\n      isAnonymous\n      imageAttachments {\n        attachmentId\n        imageUrl\n        imageThumbnailUrl\n        __typename\n      }\n      user {\n        fullName\n        image\n        url\n        __typename\n      }\n      likeDislike {\n        totalLike\n        likeStatus\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n"}, {"operationName": "PDPReviewListQuery", "variables": {"page": 1, "rating": 0, "withAttachment": 0, "productID": 353506414, "perPage": 10}, "query": "query PDPReviewListQuery($productID: Int!, $page: Int!, $perPage: Int!, $rating: Int!, $withAttachment: Int!) {\n  ProductReviewListQuery(productId: $productID, page: $page, perPage: $perPage, rating: $rating, withAttachment: $withAttachment) {\n    shop {\n      shopId\n      name\n      image\n      url\n      __typename\n    }\n    list {\n      reviewId\n      message\n      productRating\n      reviewCreateTime\n      reviewCreateTimestamp\n      isReportable\n      isAnonymous\n      imageAttachments {\n        attachmentId\n        imageUrl\n        imageThumbnailUrl\n        __typename\n      }\n      reviewResponse {\n        message\n        createTime\n        __typename\n      }\n      likeDislike {\n        totalLike\n        likeStatus\n        __typename\n      }\n      user {\n        userId\n        fullName\n        image\n        url\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n"}]

res = requests.post("https://gql.tokopedia.com/", json=payload)

data = res.json()

with open("data.json", "w") as f:
    json.dump(data, f)

上述脚本会将评论信息以 json 格式保存到文件中。

为了得到评分

print(data[0]['data']['ProductRatingQuery']['ratingScore'])
``

推荐阅读