首页 > 解决方案 > 如何修复 ParseError: not well-formed (invalid token): line 1, column 0 in Python

问题描述

我正在使用一段代码,该代码应该能够通过一些训练有素的模型从图像中检测道路损坏。这是检查数据集的统计信息以计算总图像和标签数量的代码的一部分。xml.etree 存在错误

from xml.etree import ElementTree
from xml.dom import minidom
import collections
import os
import matplotlib.pyplot as plt
import matplotlib as matplot
import seaborn as sns
%matplotlib inline



cls_names = []
total_images = 0
for gov in govs:

    file_list = os.listdir(base_path + gov + '/Annotations/')

    for file in file_list:

        total_images = total_images + 1
        if file =='.DS_Store':
            pass
        else:
            infile_xml = open(base_path + gov + '/Annotations/' +file)
            tree = ElementTree.parse(infile_xml)
            root = tree.getroot()
            for obj in root.iter('object'):
                cls_name = obj.find('name').text
                cls_names.append(cls_name)
print("total")
print("# of images:" + str(total_images))
print("# of labels:" + str(len(cls_names)))

我希望显示的图像数量和标签数量

标签: python

解决方案


该异常表明您尝试加载的文件之一不是格式正确的 XML。尝试将部分包围ElementTree.parse()在一个try...except块中,并打印文件名,以便查看哪个文件有问题。

更新

from xml.etree import ElementTree, ParseError
from xml.dom import minidom
import collections
import os
import matplotlib.pyplot as plt
import matplotlib as matplot
import seaborn as sns

cls_names = []
total_images = 0
for gov in govs:

    file_list = os.listdir(base_path + gov + '/Annotations/')

    for file in file_list:

        total_images = total_images + 1
        if file =='.DS_Store':
            pass
        else:
            try:
                infile_xml = open(base_path + gov + '/Annotations/' +file)
                tree = ElementTree.parse(infile_xml)
                root = tree.getroot()
                for obj in root.iter('object'):
                    cls_name = obj.find('name').text
                    cls_names.append(cls_name)
            except ParseError:
                 print("Parse error with %s", file) 
print("total")
print("# of images:" + str(total_images))
print("# of labels:" + str(len(cls_names)))

推荐阅读