首页 > 解决方案 > How to get the innerHTML of all first child elements as a list or dataframe

问题描述

I would like to get the innerHTML of all first child elements of a specific tag that the class name is "list-group" by using selenium or beautifulsoup in python.

HTML code:

<div id="history_1" class="list-group">
        <div>
                <p>a</p>
        </div>
        <div>
                <p>b</p>
        </div>
        <div>
                <p>c</p>
        </div>
        <p>
                d
        </p>
</div>
<div>
....
</div>
<div id="history_2" class="list-group">
        <div>
                <p>e</p>
        </div>
        <div>
                <p>f</p>
        </div>
        <div>
                <p>g</p>
        </div>
        <p>
                h
        </p>
</div>

I want to get the result like below:

result[0] = "<div><p>a</p></div>"

result[1] = "<div><p>b</p></div>"

result[2] = "<div><p>c</p></div>"

result[3] = "<p>d</p>"

result[4] = "<div><p>e</p></div>"

result[5] = "<div><p>f</p></div>"

result[6] = "<div><p>g</p></div>"

result[7] = "<p>h</p>"

Any help is appreciated.

标签: pythonseleniumbeautifulsoup

解决方案


指你想达到的,你想得到的outerHTML,不是innerHTML

用于//*查询所有节点并parent满足您的需求,如下所示:

elements = driver.find_elements_by_xpath("//*[parent::*[@class='list-group']]")
for element in elements:
    print(element.get_attribute('outerHTML'))

推荐阅读