首页 > 解决方案 > BeautifulSoup 让 'span' 内容彼此相邻

问题描述

HTML 的一部分如下所示。我想提取“跨度”标签中的内容:

from bs4 import BeautifulSoup
data = """
<section><h2>Team</h2><ul><li><ul><li><span>J36</span>—&lt;span>John</span></li><li><span>B56</span>—&lt;span>Bratt</span></li><li><span>K3</span>—&lt;span>Kate</span></li></ul></li></ul></section>
... """
soup = BeautifulSoup(data, "html.parser")

classification = soup.find_all('section')[0].find_all('span')

for c in classification:
    print (c.text)

结果是:

J36
John
B56
Bratt
K3
Kate

但通缉:

J36-John
B56-Bratt
K3-Kate

除了以下内容之外,提取内容的正确 BeautifulSoup 方法是什么?谢谢你。

contents = [c.text for c in classification]

l = contents[0::2]
ll = contents[1::2]

for a in zip(l, ll):
    print ('-'.join(a))

标签: pythonparsingweb-scrapingbeautifulsoup

解决方案


你可以得到下一个兄弟标签。如果是破折号,它将与文本一起打印,否则将仅打印文本。

for c in classification:
    if c.next_sibling:
        print(c.text + str(c.next_sibling), end='')
    else:
        print(c.text)

推荐阅读