python - 在 Google Colab 上的自定义数据集上微调 EfficientDet 时出现 Gather_Nd 错误
问题描述
我正在尝试使用 Google Colab(免费)在自定义数据集上微调 EfficientDet 以进行多对象检测。我是 tf 的新手,所以我尝试复制/修改现有笔记本(这个:https ://colab.research.google.com/drive/1iOydvFQVE-syG-ixEyam04X3E40Lx7NA?usp=sharing )
这就是问题所在。训练时出现以下错误:
(0) Invalid argument: indices[2] = [2] does not index into param shape [1,1], node name: parser/GatherNd_1
[[{{node parser/GatherNd_1}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_4303]]
尽管我知道它可能来自 TFrecord 文件,但无法得到它的来源。我的火车数据集由 png 图像(大小调整为 256x256)和边界框的相关元数据组成。这是我生成 tfrecord 文件的方法:
def create_tf_example(filepath, df_label):
encoded_image_data = open(filepath, "rb").read()
key = hashlib.sha256(encoded_image_data).hexdigest()
filename = os.path.basename(filepath)
image_name = filename.replace(".png", "")
height0 = df_label["height0"].loc[df_label["id"]==image_name].iloc[0]
width0 = df_label["width0"].loc[df_label["id"]==image_name].iloc[0]
image_format = b'png'
width = 256
height = 256
xmins = [x / width0 for x in df_label["xmins0"].loc[df_label["id"]==image_name].iloc[0]]
xmaxs = [x / width0 for x in df_label["xmaxs0"].loc[df_label["id"]==image_name].iloc[0]]
ymins = [x / height0 for x in df_label["ymins0"].loc[df_label["id"]==image_name].iloc[0]]
ymaxs = [x / height0 for x in df_label["ymaxs0"].loc[df_label["id"]==image_name].iloc[0]]
classes_text = ["opacity".encode("utf-8")]
classes = [1]
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': tf.train.Feature(int64_list=tf.train.Int64List(value=[height])),
'image/width': tf.train.Feature(int64_list=tf.train.Int64List(value=[width])),
"image/filename": tf.train.Feature(bytes_list=tf.train.BytesList(value=[filename.encode("utf-8")])),
"image/source_id": tf.train.Feature(bytes_list=tf.train.BytesList(value=['0'.encode("utf-8")])), # Pb with image names solved with this hack
"image/key/sha256": tf.train.Feature(bytes_list=tf.train.BytesList(value=[key.encode("utf-8")])),
"image/encoded": tf.train.Feature(bytes_list=tf.train.BytesList(value=[encoded_image_data])),
"image/format": tf.train.Feature(bytes_list=tf.train.BytesList(value=["png".encode("utf-8")])),
"image/object/bbox/xmin": tf.train.Feature(float_list=tf.train.FloatList(value=xmins)),
"image/object/bbox/xmax": tf.train.Feature(float_list=tf.train.FloatList(value=xmaxs)),
"image/object/bbox/ymin": tf.train.Feature(float_list=tf.train.FloatList(value=ymins)),
"image/object/bbox/ymax": tf.train.Feature(float_list=tf.train.FloatList(value=ymaxs)),
"image/object/class/text": tf.train.Feature(bytes_list=tf.train.BytesList(value=classes_text)),
"image/object/class/label": tf.train.Feature(int64_list=tf.train.Int64List(value=classes)),
}))
return tf_example
writer_train = tf.io.TFRecordWriter('/content/drive/MyDrive/siim-covid19-detection/TFRecords/train/train.tfrecord')
for filepath in train_filepaths:
tf_example = create_tf_example(filepath, df_train)
writer_train.write(tf_example.SerializeToString())
writer_train.close()
val.tfrecord 的代码相同。
我用这个下载了模型:
if not os.path.isdir("automl"):
!git clone --depth 1 https://github.com/google/automl
%cd automl
!git checkout f2b4480703278250fb05abe38a2f4ecbb16ba463 # Recent commit
%cd efficientdet
%pip install -r requirements.txt
%pip install -U "git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI"
MODEL = "efficientdet-d0"
if not os.path.exists(f"{MODEL}.tar.gz"):
!curl -O https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/{MODEL}.tar.gz
!tar xvzf {MODEL}.tar.gz
配置是这样的:
PROJ_DIR = "/content/MODEL"
CONFIG_DIR = os.path.join(PROJ_DIR, "configs")
CONFIG_FILE = os.path.join(CONFIG_DIR, "default.yaml")
if not os.path.exists(CONFIG_DIR):
os.mkdir(CONFIG_DIR)
config_text = \
"""image_size: 256x256 # this is the size of my images
num_classes: 1
label_map: {1: opacity}
input_rand_hflip: true
jitter_min: 0.8
jitter_max: 1.2
"""
with open(CONFIG_FILE, "w") as fwrite:
fwrite.write(config_text)
TFRECORD_DIR = "/content/drive/MyDrive/siim-covid19-detection/TFRecords"
CKPT = MODEL
TRAIN_SET = os.path.join(TFRECORD_DIR, "train/train.tfrecord")
VAL_SET = os.path.join(TFRECORD_DIR, "val/val.tfrecord")
MODEL_DIR_TMP = os.path.join(PROJ_DIR, "tmp", f"{MODEL}-finetune")
TRAIN_NUM_EXAMPLES = len(train_filepaths)
EVAL_NUM_EXAMPLES = len(val_filepaths)
EPOCHS = 2
BATCH_SIZE = 16
以下是我开始培训的方式:
!python -m main \
--mode=train_and_eval \
--train_file_pattern={TRAIN_SET} \
--val_file_pattern={VAL_SET} \
--model_name={MODEL} \
--model_dir={MODEL_DIR_TMP} \
--ckpt={CKPT} \
--train_batch_size={BATCH_SIZE} \
--eval_batch_size={BATCH_SIZE} \
--num_epochs={EPOCHS} \
--num_examples_per_epoch={TRAIN_NUM_EXAMPLES} \
--eval_samples={EVAL_NUM_EXAMPLES} \
--hparams={CONFIG_FILE}
在此先感谢您的帮助 !
解决方案
好吧,我发现了我的错误:类和类文本的数量与 tfrecord 文件中样本的 bbox 数量不匹配。我像这样更改了代码,它成功了:
def create_tf_example(filepath, df_label):
encoded_image_data = open(filepath, "rb").read()
key = hashlib.sha256(encoded_image_data).hexdigest()
filename = os.path.basename(filepath)
image_name = filename.replace(".png", "")
height0 = df_label["height0"].loc[df_label["id"]==image_name].iloc[0]
width0 = df_label["width0"].loc[df_label["id"]==image_name].iloc[0]
image_format = b'png'
width = 256
height = 256
xmins = [x / width0 for x in df_label["xmins0"].loc[df_label["id"]==image_name].iloc[0]]
xmaxs = [x / width0 for x in df_label["xmaxs0"].loc[df_label["id"]==image_name].iloc[0]]
ymins = [x / height0 for x in df_label["ymins0"].loc[df_label["id"]==image_name].iloc[0]]
ymaxs = [x / height0 for x in df_label["ymaxs0"].loc[df_label["id"]==image_name].iloc[0]]
classes_text = ["opacity".encode("utf-8")]*len(xmins) # now it creates a list of strings with length equal to the number of bbox
classes = [0]*len(xmins) # now it creates a list of int with length equal to the number of bbox
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': tf.train.Feature(int64_list=tf.train.Int64List(value=[height])),
'image/width': tf.train.Feature(int64_list=tf.train.Int64List(value=[width])),
"image/filename": tf.train.Feature(bytes_list=tf.train.BytesList(value=[filename.encode("utf-8")])),
"image/source_id": tf.train.Feature(bytes_list=tf.train.BytesList(value=['0'.encode("utf-8")])), # Pb with image names solved with this hack
"image/key/sha256": tf.train.Feature(bytes_list=tf.train.BytesList(value=[key.encode("utf-8")])),
"image/encoded": tf.train.Feature(bytes_list=tf.train.BytesList(value=[encoded_image_data])),
"image/format": tf.train.Feature(bytes_list=tf.train.BytesList(value=["png".encode("utf-8")])),
"image/object/bbox/xmin": tf.train.Feature(float_list=tf.train.FloatList(value=xmins)),
"image/object/bbox/xmax": tf.train.Feature(float_list=tf.train.FloatList(value=xmaxs)),
"image/object/bbox/ymin": tf.train.Feature(float_list=tf.train.FloatList(value=ymins)),
"image/object/bbox/ymax": tf.train.Feature(float_list=tf.train.FloatList(value=ymaxs)),
"image/object/class/text": tf.train.Feature(bytes_list=tf.train.BytesList(value=classes_text)),
"image/object/class/label": tf.train.Feature(int64_list=tf.train.Int64List(value=classes)),
}))
return tf_example
推荐阅读
- awk - 使用 Bash 从每个基因的 fasta 序列中提取位置 2-7
- julia - 在 Julia 中加载 CSV 文件时出现问题
- javascript - React 组件和 js 库之间的 Javascript 通信
- php - Eclipse PHP 页面在 IDE 和浏览器中的显示方式不同
- c++ - 如何处理这个静态断言失败的特征错误?
- php - 队列作业中的 Laravel 和 FFMpeg 出错
- excel - 从内存中释放/释放与清除全局/公共变量 - Excel VBA
- python - 无法在 created_date 过滤我的 Django 模型依赖
- c# - 在 asp .net core web api 中使用 DOCTYPE 发布 xml 请求时出现验证错误
- java - 在不改变程序逻辑的情况下优化给定的解决方案。还要计算时间复杂度