amazon-web-services - Amazon SageMaker:客户错误:二进制文本分类的训练未成功完成
问题描述
这是我第一次尝试 Amazon SageMaker。本质上,我试图在 SageMaker 中使用带有 Blazing Text 的二进制分类器创建垃圾邮件检测过滤器。尝试使用以下命令训练模型:
bt_model = sagemaker.estimator.Estimator(container,
role,
train_instance_count=1,
train_instance_type='ml.c4.4xlarge',
train_volume_size = 100,
train_max_run = 360000,
input_mode= 'File',
output_path=s3_output_location,
sagemaker_session=sess)
和
bt_model.set_hyperparameters(mode="supervised",
epochs=500,
min_count=2,
learning_rate=0.05,
vector_dim=15,
early_stopping=True,
patience=10,
min_epochs=200,
word_ngrams=2)
但是当我尝试运行它时,我得到以下日志:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| timestamp | message |
|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1592870667409 | Arguments: train |
| 1592870667409 | [06/23/2020 00:04:24 INFO 139842440988480] nvidia-smi took: 0.0252470970154 secs to identify 0 gpus |
| 1592870667409 | [06/23/2020 00:04:24 INFO 139842440988480] Running single machine CPU BlazingText training using supervised mode. |
| 1592870667409 | [06/23/2020 00:04:24 INFO 139842440988480] Processing /opt/ml/input/data/train/clause.train . File size: 2 MB |
| 1592870667409 | [06/23/2020 00:04:24 INFO 139842440988480] Processing /opt/ml/input/data/validation/clause.validation . File size: 2 MB |
| 1592870667409 | Read 0M words |
| 1592870667409 | Number of words: 12005 |
| 1592870667409 | Loading validation data from /opt/ml/input/data/validation/clause.validation |
| 1592870667409 | Loaded validation data. |
| 1592870667409 | -------------- End of epoch: 43 |
| 1592870667409 | ##### Alpha: 0.0433 Progress: 13.37% Million Words/sec: 86.39 ##### |
| 1592870667409 | -------------- End of epoch: 66 |
| 1592870667409 | -------------- End of epoch: 90 |
| 1592870667409 | ##### Alpha: 0.0387 Progress: 22.54% Million Words/sec: 86.66 ##### |
| 1592870667409 | -------------- End of epoch: 112 |
| 1592870667409 | -------------- End of epoch: 135 |
| 1592870667409 | ##### Alpha: 0.0341 Progress: 31.81% Million Words/sec: 87.08 ##### |
| 1592870667409 | -------------- End of epoch: 159 |
| 1592870667409 | -------------- End of epoch: 182 |
| 1592870667409 | ##### Alpha: 0.0295 Progress: 41.07% Million Words/sec: 87.30 ##### |
| 1592870667409 | -------------- End of epoch: 205 |
| 1592870667409 | Using 16 threads for prediction! |
| 1592870667409 | Validation accuracy: -nan |
| 1592870667409 | Validation accuracy has not improved for last 1 epochs. |
| 1592870667409 | -------------- End of epoch: 207 |
| 1592870667409 | Using 16 threads for prediction! |
| 1592870667409 | Validation accuracy: -nan |
| 1592870667409 | Validation accuracy has not improved for last 2 epochs. |
| 1592870667410 | -------------- End of epoch: 208 |
| 1592870667410 | Using 16 threads for prediction! |
| 1592870667410 | Validation accuracy: -nan |
| 1592870667410 | Validation accuracy has not improved for last 3 epochs. |
| 1592870667410 | -------------- End of epoch: 209 |
| 1592870667410 | Using 16 threads for prediction! |
| 1592870667410 | Validation accuracy: -nan |
| 1592870667410 | Validation accuracy has not improved for last 4 epochs. |
| 1592870667410 | -------------- End of epoch: 211 |
| 1592870667410 | Using 16 threads for prediction! |
| 1592870667410 | Validation accuracy: -nan |
| 1592870667410 | Validation accuracy has not improved for last 5 epochs. |
| 1592870667410 | -------------- End of epoch: 213 | | ... |
| 1592870667410 | Using 16 threads for prediction! |
| 1592870667410 | Validation accuracy: -nan |
| 1592870667410 | Validation accuracy has not improved for last 9 epochs. |
| 1592870667410 | -------------- End of epoch: 219 |
| 1592870667410 | Using 16 threads for prediction! |
| 1592870667410 | Validation accuracy: -nan |
| 1592870667410 | Validation accuracy has not improved for last 10 epochs. |
| 1592870667410 | Reached patience. Terminating training. |
| 1592870667410 | Best epoch: 0 |
| 1592870667410 | Best validation accuracy: 0 |
| 1592870667410 | ##### Alpha: 0.0000 Progress: 100.00% Million Words/sec: 99.57 ##### |
| 1592870669411 | [06/23/2020 00:04:29 ERROR 139842440988480] Customer Error: Training did not complete successfully! Please check the logs for errors. |
| 1592870669411 | Traceback (most recent call last): File "/opt/amazon/lib/python2.7/site-packages/blazingtext/train.py", line 75, in main train_blazing_single(resource_config, train_config, data_config) File "/opt/amazon/lib/python2.7/site-packages/blazingtext/train_methods.py", line 245, in train_blazing_single raise exceptions.CustomerError("Training did not complete successfully! Please check the logs for errors.") |
| 1592870669411 | CustomerError: Training did not complete successfully! Please check the logs for errors. |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
我相信我的训练和测试数据集已正确预处理,因此我们将不胜感激。太感谢了!