首页 > 解决方案 > google ai 平台上的超参数调优错误:“replica master 0 exited with a non-zero status of 1”

问题描述

在谷歌云 AI 平台上训练深度学习模型时,使用超参数调整(我的超参数配置信息在 YAML 文件中),我收到此错误:

Hyperparameter Tuning Trial #2 Failed before any other successful trials were completed. 
The failed trial had parameters: batch_size=11, learning_rate=3.527059074944887e-05, .
The trial's error message was: The replica master 0 exited with a non-zero status of 1

由于错误消息有点笼统,我很难理解问题可能出在哪里。

我的 YAML 配置文件:

trainingInput:
  hyperparameters:
    goal: MINIMIZE
    maxTrials: 2
    maxParallelTrials: 2
    hyperparameterMetricTag: loss
    enableTrialEarlyStopping: FALSE
    params:
      - parameterName: batch_size
        type: INTEGER
        minValue: 8
        maxValue: 16
        scaleType: UNIT_LINEAR_SCALE
      - parameterName: learning_rate
        type: DOUBLE
        minValue: 0.00001
        maxValue: 0.0001
        scaleType: UNIT_LINEAR_SCALE

标签: pythonmachine-learninggoogle-cloud-platformdeep-learninggoogle-ai-platform

解决方案


推荐阅读