首页 > 解决方案 > How to select the first child of each element in list in Beautiful Soup

问题描述

I want to get the text from the first inner div in each outer div

<body>
    <div class="outer">
        <div class="inner">text1</div> 
        <div class="inner">text2</div>
        <div class="inner">text3</div>
    </div>
    <div class="outer">
        <div class="inner">text4</div>
    </div>
    <div class="outer">
        <div class="inner">text5</div>
        <div class="inner">text6</div>
    </div>
</body>

This is means retrieving text1, text4, text5

I've experimented with the code shown below:

outers = soup.select('body > .outer')
for outer in outers:
    inners = outer.select_one('.inner')
    for inner in inners:
        print(inner.text)

But can't get it to work

标签: pythonhtmlbeautifulsoup

解决方案


可能这行得通,

soup = BeautifulSoup(text, 'html.parser')
for outer in soup.find_all('div', class_='outer'):
    inners = outer.find('div', class_='inner')
    for inner in inners:
        print(inner)


# Output as:
#           text1
#           text4
#           text5

或者您可以使用这种方式,

soup = BeautifulSoup(text, 'html.parser')
for outer in soup.find_all('div', class_='outer'):
    inners = outer.find('div', class_='inner')
    print(inners.get_text())

推荐阅读