首页 > 解决方案 > 我如何将 img-element 和 text 放入 span-block 中?

问题描述

我有这样的跨度块:

<span class="selectable-text invisible-space copyable-text" dir="ltr">
     some text
     <img alt="" class="b61 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -20px -20px;"/>
     more some text
     <img alt="" class="b62 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -40px -40px;"/>
     blah-blah-blah
     <img alt="" class="b76 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: 0px -20px;"/>
</span>
soup.find('span', {'class': 'selectable-text invisible-space copyable-text'}).get_text()

这段代码只给我文字。

我所想到的一切

span = soup.select('span', {'class': 'selectable-text invisible-space copyable-text'})
for item in span:
    if re.match('.*emoji', str(item)):
        ...

现在我有这样的字符串:

<span class="selectable-text invisible-space copyable-text" dir="ltr">some text <img alt="" class="b61 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -20px -20px;"/>more some text<img alt="" class="b62 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -40px -40px;"/> blah-blah-blah  <img alt="" class="b76 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: 0px -20px;"/></span>

在我看来,下一步是使用正则表达式来获取我需要的元素。

有没有其他方法可以得到一个字符串,如:

some text <emoji> more some text <emoji> blah-blah-blah <emoji>

标签: pythonhtmlpython-3.xbeautifulsoup

解决方案


如果您想将文本和 img 提取到一个跨度中,那么下面的代码应该可以工作。

from bs4 import BeautifulSoup as bs

stra = """
<span class="selectable-text invisible-space copyable-text" dir="ltr">
     some text
     <img alt="" class="b61 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -20px -20px;"/>
     more some text
     <img alt="" class="b62 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -40px -40px;"/>
     blah-blah-blah
     <img alt="" class="b76 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: 0px -20px;"/>
</span>
"""
soup = bs(stra, 'html.parser')

ch = list(soup.find('span', {'class': 'selectable-text invisible-space copyable-text'}).children)

for i in zip(ch[::2], ch[1::2]):
    print('<span>{}{}</span>'.format(*i))

输出:

<span>
     some text
     <img alt="" class="b61 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -20px -20px;"/>
</span>
<span>
     more some text
     <img alt="" class="b62 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -40px -40px;"/>
</span>
<span>
     blah-blah-blah
     <img alt="" class="b76 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: 0px -20px;"/>
</span>

推荐阅读