我正在尝试提取位于<table>标签上方和下方的 HTML 部分,例如从下面的示例 html 中:

sample_html = """
<html>
<title><b>Main Title</b></Title>
<b>more</b>
<b>stuff</b>
<,html,python-3.x,beautifulsoup"/>
	














首页 > 解决方案 > 从标签外部提取 HTML

我正在尝试提取位于<table>标签上方和下方的 HTML 部分,例如从下面的示例 html 中:

sample_html = """
<html>
<title><b>Main Title</b></Title>
<b>more</b>
<b>stuff</b>
<

问题描述

我正在尝试提取位于<table>标签上方和下方的 HTML 部分,例如从下面的示例 html 中:

sample_html = """
<html>
<title><b>Main Title</b></Title>
<b>more</b>
<b>stuff</b>
<b>in here!</b>
<table class="softwares" border="1" cellpadding="0" width="99%">
    <thead style="background-color: #ededed">
        <tr>
            <td colspan="5"><b>Windows</b></td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td><b>Type</b></td>
            <td><b>Issue</b></td>
            <td><b>Restart</b></td>
            <td><b>Severity</b></td>  
            <td><b>Impact</b></td>  
        </tr>
        <tr>
            <td>some item</td>
            <td><a href="some website">some website</a><br></td>
            <td>Yes<br></td>
            <td>Critical<br></td>
            <td>stuff<br></td>
        </tr>    
        <tr>
            <td>some item</td>
            <td><a href="some website">some website</a><br></td>
            <td>Yes<br></td>
            <td>Important<br></td>
            <td>stuff<br></td>    
        </tr>
    </tbody>
</table>
<b>AGAIN</b>
<b>more</b>
<b>stuff</b>
<b>down here!</b>
</html>
"""

我想获得类似的东西。

top_html = """
<html>
<title><b>Main Title</b></Title>
<b>more</b>
<b>stuff</b>
<b>in here!</b>
</html>
"""

bottom_html = """
<html>
<b>AGAIN</b>
<b>more</b>
<b>stuff</b>
<b>down here!</b>
</html>
"""

或者已经是文本格式,例如:

top_html = 'Main Title more stuff down here!'

bottom_html = 'AGAIN more stuff down here!'

所以我已经能够<table>从整个 HTML 中提取部分并进行处理(我将行<tr>和列分开,<td>以便提取我需要的值),使用以下代码:

soup = BeautifulSoup(input_html, "html.parser")
table = soup.find('table')

如何以角度8循环iframe

iframe 一直在重新加载。有没有办法停止重新加载?

我的代码

<ng-container  *ngFor="let element of elements">
<iframe
      [src]="sanitizer.bypassSecurityTrustResourceUrl(element?.src)"
      width="500"
      height="500"
>
</iframe>
</ng-container>

标签: htmlpython-3.xbeautifulsoup

解决方案


This solution doesn't extensively use BeautifulSoup but works. Get index of opening and closing table tags, extract strings before and after.

soup = BeautifulSoup(sample_html, "html.parser")

def extract_top_and_bottom(soup):
    index_of_opening_tag = soup.index("<table")
    index_of_closing_tag = soup.index("</table>")

    top_html = soup[:index_of_opening_tag]
    bottom_html = soup[index_of_closing_tag::].replace("</table>", '')

    print(top_html)
    print(bottom_html)

extract_top_and_bottom(str(soup))

推荐阅读