首页 > 解决方案 > Python BS4 - NameError:名称'tagID'未定义

问题描述

我是 python 新手。我正在构建一个应用程序来解析和清理 MSWord 生成的 HTML。在下面的代码中,我将内容作为 BS4 对象传递,并尝试使用新属性更新特定的 span 标签。

content = ' <html>
    <head></head>
    <body>
    <span style="background: #c0c0c0">Table 1</span>
    <span style="background: #c0c0c0">Figure 1</span>
    </body>
    </html>'

def clean_table_figure_id_tags(content):
    for element in content.findAll('span', style='background: #ccc0'):
        # inspect the existing tag a to determine table or figure
        if 'Table' in element.string:
            tagID = 'TableId'
        elif 'Figure' in element.string:
            tagID = 'FigureId'
        # tagID = content(elementString)
        newTag = Tag(builder=content.builder, name='span', attrs={'id': tagID, 'class': 'variable'})
        newTag.string = element.string
        element.replace_with(newTag)
    return content

但是,我收到以下错误:NameError: name 'tagID' is not defined 非常感谢任何帮助。

标签: python-3.xbeautifulsoup

解决方案


如果我理解正确,您正在寻找这样的东西:

from bs4 import BeautifulSoup as bs
content = """[your html above]"""
soup = bs(content,'lxml')

for elem in soup.select('span[style="background: #c0c0c0"]'):
    if "Table" in elem.text:
        elem.attrs['id'] = 'TableId'
    if "Figure" in elem.text:
        elem.attrs['id'] = 'FigureId'
print(soup.prettify())

输出:

<html>
 <head>
 </head>
 <body>
  <span id="TableId" style="background: #c0c0c0">
   Table 1
  </span>
  <span id="FigureId" style="background: #c0c0c0">
   Figure 1
  </span>
 </body>
</html>

推荐阅读