首页 > 解决方案 > 如何只获取模型#,但它在多个行内

  • 标签 HTML 网页抓取
  • 问题描述

    <ul class="item-features">
                    <li><strong>Max Resolution:</strong> 7680 x 4320</li>
                    <li><strong>DisplayPort:</strong> 3 x DisplayPort 1.4</li>
                    <li><strong>HDMI:</strong> 1 x HDMI 2.1</li>
                    <li><strong>Card Dimensions (L x H):</strong> 9.13" x 4.88"</li>
                    <li><strong>Model #: </strong>RTX3060TiVENTUS2XOC</li>
                    <li><strong>Item #: </strong>N82E16814137612</li>
                    <li><strong>Return Policy: </strong><a href="https://kb.newegg.com/Article/Index/12/3?id=1167#54" target="_blank" title="Extended Holiday Replacement-Only Return Policy(New Window)">Extended Holiday Replacement-Only Return Policy</a></li>
                </ul>
    

    如您所见,有多个内联 li 标签。我将如何只提取模型#?

    我尝试通过索引访问:

    containers = page_soup.findAll("ul",{"class":"item-features"})
    containers.li[4]
    

    但这会出错。任何帮助将不胜感激。

    标签: htmlpython-3.xweb-scrapingbeautifulsoup

    解决方案


    您需要找到索引为 4 的子项:

    container = page_soup.find("ul",{"class":"item-features"})
    model = container.findChildren('li')[4]
    

    推荐阅读