首页 > 解决方案 > 抓取特定的子元素

问题描述

我想,这样我需要两个列表

ListA = ["Driver Convenience","Exterior Features"]

ListB = ["2 key fob;Collision mitigation braking system;","Body coloured plastic front bumper;Boulder grey exterior door handle;Boulder grey exterior door mirrorn;"]

ListA将在标签内包含文本,并将在h4标签ListB内包含文本,li直到h4找到下一个标签。

这是一个示例HTML代码:

<ul class="c-list-table">   
    <h4 class="c-list-table__section-heading">Driver Convenience</h4>
<li class="c-list-table__item" rel="2-key-fob"><span class="c-list-table__title"> 2 key fob </span</li>
<li class="c-list-table__item" rel="collision-mitigation-braking-system">Collision mitigation braking system</li>
    <h4 class="c-list-table__section-heading">Exterior Features</h4>
<li class="c-list-table__item" rel="body-coloured-plastic-front-bumper">Body coloured plastic front bumper</li>
<li class="c-list-table__item" rel="boulder-grey-exterior-door-handle">Boulder grey exterior door handle</li>
<li class="c-list-table__item" rel="boulder-grey-exterior-door-mirror">Boulder grey exterior door mirrorn</li>
</ul>

HTML 与这个相同 :) 尝试了很多东西,但无法帮助自己

标签: pythonbeautifulsoup

解决方案


用于find_next_siblings('li')查找h4之后的li标签,然后验证与文本不匹配的文本,然后添加到列表中。previous_sibling('h4')

from bs4 import BeautifulSoup
data='''     
<ul class="c-list-table">   
<h4 class="c-list-table__section-heading">Driver Convenience</h4>
<li class="c-list-table__item" rel="2-key-fob"><span class="c-list-table__title"> 2 key fob </span</li>
<li class="c-list-table__item" rel="collision-mitigation-braking-system">Collision mitigation braking system</li>
<h4 class="c-list-table__section-heading">Exterior Features</h4>
<li class="c-list-table__item" rel="body-coloured-plastic-front-bumper">Body coloured plastic front bumper</li>
<li class="c-list-table__item" rel="boulder-grey-exterior-door-handle">Boulder grey exterior door handle</li>
<li class="c-list-table__item" rel="boulder-grey-exterior-door-mirror">Boulder grey exterior door mirrorn</li>
</ul>'''

ListA =[]
ListB =[]
soup=BeautifulSoup(data,'lxml')
for item in soup.find_all('h4'):
    lifinal=""
    ListA.append(item.text)
    nextlis=item.find_next_siblings('li')
    for li in nextlis:
        if li.find_previous_sibling('h4').text in item.text:
            lifinal=lifinal+li.text.strip()+";"
    ListB.append(lifinal)

print(ListA)
print(ListB)

输出

['Driver Convenience', 'Exterior Features']
['2 key fob;Collision mitigation braking system;', 'Body coloured plastic front bumper;Boulder grey exterior door handle;Boulder grey exterior door mirrorn;']

推荐阅读