首页 > 解决方案 > BeautifulSoup scraping pactpub html element returns empty item

问题描述

I have been trying to scrape the following

<li class="ais-pagination--item ais-pagination--item__next">

tag from https://www.packtpub.com/all-products/all-books which represent the next page button

using the following code:

import requests
import re
import bs4


headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
start_url="https://www.packtpub.com/all-products/all-books"
req = requests.get(start_url)
soup = bs4.BeautifulSoup(req.content, "lxml")
next_page_li = soup.find("li", class_="ais-pagination--item ais-pagination--item__next")

I get None in return, please help.

标签: pythonbeautifulsoup

解决方案


这背后的原因是,您正在处理一个网站,一旦页面通过XHR 请求加载到以下加载数据POST的后端APIJSON ,它的内容就会动态加载。

因此,为了验证网站内容是动态的还是静态的,您必须查看页面源并搜索所需的元素,如果它在那里,那么它就是静态内容,如果不是,那么您正在处理您的动态对象必须追查它是从哪里获得的。

请注意,查看页面源与直接检查元素不同。因为检查元素将在两种情况下查看它。但是您会注意到任何动态内容event旁边都会有一个标志。

下面是对 的直接调用API,以防您想循环遍历结果,那么您必须使用hitsPerPage它最多可以获得1000每个调用。然后你必须遍历页面。由于该网站包含5660,因此您必须循环进入,range(6)因为该网站的分页从0.

import requests


def main(url):
    with requests.Session() as req:

        params = {
            "x-algolia-agent": "Algolia for vanilla JavaScript (lite) 3.27.0;instantsearch.js 2.10.2;Magento2 integration (1.13.3);JS Helper 2.26.0",
            "x-algolia-application-id": "VIVZZXFQG1",
            "x-algolia-api-key": "MjBiNTIwZWM0MmE4MWQ0MDQwNzIxY2Q5ZTQ0ZjE0ZDNkMzI4ZDVkZWJiYzcxNGI1NjA2MWYzNmUyNTQxY2ViZnRhZ0ZpbHRlcnM9"
        }

        data = {"requests": [{"indexName": "store_prod_us_products_packt_rank_asc", "params": "query=&hitsPerPage=24&maxValuesPerFacet=10&page=0&ruleContexts=%5B%22%22%2C%22magento-category-7164%22%5D&clickAnalytics=true&facets=%5B%22product_type_filter%22%2C%22released%22%2C%22language%22%2C%22concept%22%2C%22tool%22%2C%22vendor%22%2C%22categories.level0%22%2C%22categories.level1%22%2C%22categories.level2%22%2C%22categories.level0%22%2C%22categories.level1%22%2C%22categories.level2%22%5D&tagFilters=&facetFilters=%5B%5B%22released%3AAvailable%22%5D%2C%5B%22categories.level1%3AAll%20Products%20%2F%2F%2F%20All%20Books%22%5D%5D&numericFilters=%5B%22visibility_catalog%3D1%22%5D"}, {"indexName": "store_prod_us_products_packt_rank_asc", "params": "query=&hitsPerPage=1&maxValuesPerFacet=10&page=0&ruleContexts=%5B%22%22%2C%22magento-category-7164%22%5D&clickAnalytics=false&attributesToRetrieve=%5B%5D&attributesToHighlight=%5B%5D&attributesToSnippet=%5B%5D&tagFilters=&analytics=false&facets=released&numericFilters=%5B%22visibility_catalog%3D1%22%5D&facetFilters=%5B%5B%22categories.level1%3AAll%20Products%20%2F%2F%2F%20All%20Books%22%5D%5D"}, {
            "indexName": "store_prod_us_products_packt_rank_asc", "params": "query=&hitsPerPage=1&maxValuesPerFacet=10&page=0&ruleContexts=%5B%22%22%2C%22magento-category-7164%22%5D&clickAnalytics=false&attributesToRetrieve=%5B%5D&attributesToHighlight=%5B%5D&attributesToSnippet=%5B%5D&tagFilters=&analytics=false&facets=%5B%22categories.level0%22%2C%22categories.level1%22%5D&numericFilters=%5B%22visibility_catalog%3D1%22%5D&facetFilters=%5B%5B%22released%3AAvailable%22%5D%2C%5B%22categories.level0%3AAll%20Products%22%5D%5D"}, {"indexName": "store_prod_us_products_packt_rank_asc", "params": "query=&hitsPerPage=1&maxValuesPerFacet=10&page=0&ruleContexts=%5B%22%22%2C%22magento-category-7164%22%5D&clickAnalytics=false&attributesToRetrieve=%5B%5D&attributesToHighlight=%5B%5D&attributesToSnippet=%5B%5D&tagFilters=&analytics=false&facets=%5B%22categories.level0%22%5D&numericFilters=%5B%22visibility_catalog%3D1%22%5D&facetFilters=%5B%5B%22released%3AAvailable%22%5D%5D"}]}
        r = req.post(url, params=params, json=data).json()
        for x in r['results'][0]['hits']:
            print(x['name'])


if __name__ == "__main__":
    main('https://vivzzxfqg1-dsn.algolia.net/1/indexes/*/queries')

输出:

C# 9 and .NET 5 – Modern Cross-Platform Development - Fifth Edition
40 Algorithms Every Programmer Should Know
Machine Learning for Algorithmic Trading - Second Edition
Learning C# by Developing Games with Unity 2020 - Fifth Edition
Solutions Architect's Handbook
Python Machine Learning - Third Edition
The Python Workshop
Kubernetes and Docker - An Enterprise Guide
Django 3 By Example - Third Edition
Full-Stack React, TypeScript, and Node
Responsive Web Design with HTML5 and CSS - Third Edition
Learn Python Programming - Second Edition
CompTIA Security+: SY0-601 Certification Guide - Second Edition
Hands-On Quantum Information Processing with Python
Node.js Design Patterns - Third Edition
ASP.NET Core 5 for Beginners
Learning Tableau 2020 - Fourth Edition
AWS Penetration Testing
Python 3 Object-Oriented Programming - Third Edition
Hands-On Unity 2020 Game Development
Software Architecture with C# 9 and .NET 5 - Second Edition
Mastering Blockchain - Third Edition
The Docker Workshop
Data Engineering with Python

推荐阅读