首页 > 解决方案 > Tensorboard 未显示最后一个检查点的评估结果

问题描述

我使用 TensorFlow 对象检测 API 为 4K 步的自定义数据训练了一些对象检测模型,并在训练期间对其进行了评估。对所有检查点都进行了评估,我在控制台上查看了结果。

但是,出于某种原因,我在 Tensorboard 上看不到最后两个检查点的评估结果。它显示了 3K 步的评估结果,之后就没有了。我可以在控制台和文件夹中看到评估已完成。

当我启动 Tensorboard 时,控制台上没有错误消息。我可以看到训练结果完全上传到 Tensorboard,唯一缺少的是最后的评估结果。

我尝试再次评估最新的检查点,但没有任何改变。在评估结束时,我收到一条消息,说指标已记录到摘要中......

训练检查点每 10 分钟保存一次,评估需要 12 分钟。但即使在这种情况下,我也希望有最新的检查点评估结果。

当我尝试从 Tensorboard 下载 csv 文件时,我也看不到最后两个检查点的评估。

可能是什么原因?

I0311 16:57:21.281645 MainThread program.py:165] Not bringing up TensorBoard, but inspecting event files.
I0311 16:57:21.281645 140028330256128 program.py:165] Not bringing up TensorBoard, but inspecting event files.
======================================================================
Processing event files... (this can take a few minutes)
======================================================================

Found event files in:
./CN_flow1_95/eval
./CN_flow1_95/train

These tags are in ./CN_flow1_95/eval:
audio -
histograms -
images
   image-0
   image-1
   image-2
   image-3
   image-4
   image-5
   image-6
   image-7
   image-8
   image-9
scalars
   Losses/Loss/BoxClassifierLoss/classification_loss
   Losses/Loss/BoxClassifierLoss/localization_loss
   Losses/Loss/RPNLoss/localization_loss
   Losses/Loss/RPNLoss/objectness_loss
   PascalBoxes_PerformanceByCategory/AP@0.5IOU/b'cyclist'
   PascalBoxes_PerformanceByCategory/AP@0.5IOU/b'motorcyclist'
   PascalBoxes_PerformanceByCategory/AP@0.5IOU/b'pedestrian'
   PascalBoxes_Precision/mAP@0.5IOU
tensor -
======================================================================

Event statistics for ./CN_flow1_95/eval:
audio -
graph
   first_step           0
   last_step            0
   max_step             0
   min_step             0
   num_steps            1
   outoforder_steps     []
histograms -
images
   first_step           0
   last_step            4112
   max_step             4112
   min_step             0
   num_steps            7
   outoforder_steps     []
scalars
   first_step           0
   last_step            4112
   max_step             4112
   min_step             0
   num_steps            7
   outoforder_steps     []
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor -
======================================================================

These tags are in ./CN_flow1_95/train:
audio -
histograms
   ModelVars/...
images -
scalars
   Losses/TotalLoss
   Losses/clone_0/Loss/BoxClassifierLoss/classification_loss
   Losses/clone_0/Loss/BoxClassifierLoss/localization_loss
   Losses/clone_0/Loss/RPNLoss/localization_loss
   Losses/clone_0/Loss/RPNLoss/objectness_loss
   Losses/clone_1/Loss/BoxClassifierLoss/classification_loss
   Losses/clone_1/Loss/BoxClassifierLoss/localization_loss
   Losses/clone_1/Loss/RPNLoss/localization_loss
   Losses/clone_1/Loss/RPNLoss/objectness_loss
   Losses/clone_2/Loss/BoxClassifierLoss/classification_loss
   Losses/clone_2/Loss/BoxClassifierLoss/localization_loss
   Losses/clone_2/Loss/RPNLoss/localization_loss
   Losses/clone_2/Loss/RPNLoss/objectness_loss
   batch/fraction_of_150_full
   clone_0/Losses/clone_0//clone_loss
   global_step/sec
   queue/prefetch_queue/fraction_of_5_full
tensor -
======================================================================

Event statistics for ./CN_flow1_95/train:
audio -
graph
   first_step           0
   last_step            0
   max_step             0
   min_step             0
   num_steps            1
   outoforder_steps     []
histograms
   first_step           0
   last_step            4110
   max_step             4110
   min_step             0
   num_steps            28
   outoforder_steps     []
images -
scalars
   first_step           0
   last_step            4110
   max_step             4110
   min_step             0
   num_steps            54
   outoforder_steps     []
sessionlog:checkpoint
   first_step           1
   last_step            4111
   max_step             4111
   min_step             1
   num_steps            7
   outoforder_steps     []
sessionlog:start
   outoforder_steps     []
   steps                [0, 4110]
sessionlog:stop
   outoforder_steps     []
   steps                [0, 0]
tensor -
======================================================================

标签: tensorflowtensorboardobject-detection-api

解决方案


我也在 TensorBoard repo 上问过这个问题。他们说没有理由不完美加载事件文件,并告诉我来这里......

有时会看到正确的结果(如果由于详尽的测试而有 10-15 个事件文件),但大多数情况下他们看不到。我更改了存储检查点的频率,以便在评估期间不遗漏任何一个(没有意义,但仍然尝试过)

我每 12 分钟存储一次检查点,因为评估也需要 12 分钟。它也没有工作。

所有 tensorboard --inspect 结果看起来都很好。

我在不同的电脑上尝试了不同的机型,还清理了浏览器缓存。真的没有什么帮助。

我相信张量板中有一个错误。


推荐阅读