python - 用美汤找不到元素
问题描述
我还在学习用 python 编码。我真的需要帮助才能从这个网站上抓取元素:
https://www.tokopedia.com/craftdale/crossback-apron-hijau-army?src=topads
我想从 Review (Ulasan)容器中获取 Review 数据(Review Time)
这是来自网站的 HTML
<p disabled="" data-testid="txtDateGivenReviewFilter0" class="css-oals0c-unf-heading e1qvo2ff8">1 bulan lalu</p>
我试图用这段代码获取元素
review = soup.findAll('p',class_='css-oals0c-unf-heading e1qvo2ff8')
或者
review= soup.findAll('p',id_='txtDateGivenReviewFilter0')
但结果我只得到空数据 在此处输入图像描述
任何人都可以解决这个问题吗?非常感谢
解决方案
当您分析网站时,网站会进行 ajax 调用以检索网站中的不同信息。为了获取评论信息,它使用 json 有效负载对特定端点进行 ajax 调用。
import requests, json
payload = [{"operationName": "PDPReviewRatingQuery", "variables": {"productId": 353506414}, "query": "query PDPReviewRatingQuery($productId: Int!) {\n ProductRatingQuery(productId: $productId) {\n ratingScore\n totalRating\n totalRatingWithImage\n detail {\n rate\n totalReviews\n percentage\n __typename\n }\n __typename\n }\n}\n"}, {"operationName": "PDPReviewImagesQuery", "variables": {"productID": 353506414, "page": 1}, "query": "query PDPReviewImagesQuery($page: Int, $productID: Int!) {\n ProductReviewImageListQuery(page: $page, productID: $productID) {\n detail {\n reviews {\n reviewer {\n fullName\n profilePicture\n __typename\n }\n reviewId\n message\n rating\n updateTime\n isReportable\n __typename\n }\n images {\n imageAttachmentID\n description\n uriThumbnail\n uriLarge\n reviewID\n __typename\n }\n __typename\n }\n __typename\n }\n}\n"}, {"operationName": "PDPReviewHelpfulQuery", "variables": {"productID": 353506414}, "query": "query PDPReviewHelpfulQuery($productID: Int!) {\n ProductMostHelpfulReviewQuery(productId: $productID) {\n shop {\n shopId\n __typename\n }\n list {\n reviewId\n message\n productRating\n reviewCreateTime\n reviewCreateTimestamp\n isReportable\n isAnonymous\n imageAttachments {\n attachmentId\n imageUrl\n imageThumbnailUrl\n __typename\n }\n user {\n fullName\n image\n url\n __typename\n }\n likeDislike {\n totalLike\n likeStatus\n __typename\n }\n __typename\n }\n __typename\n }\n}\n"}, {"operationName": "PDPReviewListQuery", "variables": {"page": 1, "rating": 0, "withAttachment": 0, "productID": 353506414, "perPage": 10}, "query": "query PDPReviewListQuery($productID: Int!, $page: Int!, $perPage: Int!, $rating: Int!, $withAttachment: Int!) {\n ProductReviewListQuery(productId: $productID, page: $page, perPage: $perPage, rating: $rating, withAttachment: $withAttachment) {\n shop {\n shopId\n name\n image\n url\n __typename\n }\n list {\n reviewId\n message\n productRating\n reviewCreateTime\n reviewCreateTimestamp\n isReportable\n isAnonymous\n imageAttachments {\n attachmentId\n imageUrl\n imageThumbnailUrl\n __typename\n }\n reviewResponse {\n message\n createTime\n __typename\n }\n likeDislike {\n totalLike\n likeStatus\n __typename\n }\n user {\n userId\n fullName\n image\n url\n __typename\n }\n __typename\n }\n __typename\n }\n}\n"}]
res = requests.post("https://gql.tokopedia.com/", json=payload)
data = res.json()
with open("data.json", "w") as f:
json.dump(data, f)
上述脚本会将评论信息以 json 格式保存到文件中。
为了得到评分
print(data[0]['data']['ProductRatingQuery']['ratingScore'])
``
推荐阅读
- swift - FlatMapLatest 跳过触发器,直到最新的 observable 完成
- c# - 如何修复 Angular/.NET 项目中的“跨域请求被阻止:同源策略不允许读取远程资源”
- python - 如何在给定单词之前和之后查找文本并输出到不同的文本文件中?
- android - 如果图像为空,则打印消息
- python - 是否可以使用模板中的参数调用 python 函数?[姜戈]
- python - 如何使用 XPath 提取与给定模式不匹配的项目?
- linux - 删除具有低质量碱基调用的读取
- awk - 根据 csv 中的列值获取行
- javascript - 无论如何要在鼠标仍在移动区域时停止“onmousemove”功能?
- python - 当值是python中的列表时如何将值添加到字典中