首页 > 解决方案 > Amazon SageMaker:客户错误:二进制文本分类的训练未成功完成

问题描述

这是我第一次尝试 Amazon SageMaker。本质上,我试图在 SageMaker 中使用带有 Blazing Text 的二进制分类器创建垃圾邮件检测过滤器。尝试使用以下命令训练模型:

bt_model = sagemaker.estimator.Estimator(container,
                                         role, 
                                         train_instance_count=1, 
                                         train_instance_type='ml.c4.4xlarge',
                                         train_volume_size = 100,
                                         train_max_run = 360000,
                                         input_mode= 'File',
                                         output_path=s3_output_location,
                                         sagemaker_session=sess)

bt_model.set_hyperparameters(mode="supervised",
                            epochs=500,
                            min_count=2,
                            learning_rate=0.05,
                            vector_dim=15,
                            early_stopping=True,
                            patience=10,
                            min_epochs=200,
                            word_ngrams=2)

但是当我尝试运行它时,我得到以下日志:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   timestamp   |                                                                                                                                                                                                             message                                                                                                                                                                                                              |
|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1592870667409 | Arguments: train                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| 1592870667409 | [06/23/2020 00:04:24 INFO 139842440988480] nvidia-smi took: 0.0252470970154 secs to identify 0 gpus                                                                                                                                                                                                                                                                                                                              |
| 1592870667409 | [06/23/2020 00:04:24 INFO 139842440988480] Running single machine CPU BlazingText training using supervised mode.                                                                                                                                                                                                                                                                                                                |
| 1592870667409 | [06/23/2020 00:04:24 INFO 139842440988480] Processing /opt/ml/input/data/train/clause.train . File size: 2 MB                                                                                                                                                                                                                                                                                                                    |
| 1592870667409 | [06/23/2020 00:04:24 INFO 139842440988480] Processing /opt/ml/input/data/validation/clause.validation . File size: 2 MB                                                                                                                                                                                                                                                                                                          |
| 1592870667409 | Read 0M words                                                                                                                                                                                                                                                                                                                                                                                                                    |
| 1592870667409 | Number of words:  12005                                                                                                                                                                                                                                                                                                                                                                                                          |
| 1592870667409 | Loading validation data from /opt/ml/input/data/validation/clause.validation                                                                                                                                                                                                                                                                                                                                                     |
| 1592870667409 | Loaded validation data.                                                                                                                                                                                                                                                                                                                                                                                                          |
| 1592870667409 | -------------- End of epoch: 43                                                                                                                                                                                                                                                                                                                                                                                                  |
| 1592870667409 | ##### Alpha: 0.0433  Progress: 13.37%  Million Words/sec: 86.39 #####                                                                                                                                                                                                                                                                                                                                                            |
| 1592870667409 | -------------- End of epoch: 66                                                                                                                                                                                                                                                                                                                                                                                                  |
| 1592870667409 | -------------- End of epoch: 90                                                                                                                                                                                                                                                                                                                                                                                                  |
| 1592870667409 | ##### Alpha: 0.0387  Progress: 22.54%  Million Words/sec: 86.66 #####                                                                                                                                                                                                                                                                                                                                                            |
| 1592870667409 | -------------- End of epoch: 112                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667409 | -------------- End of epoch: 135                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667409 | ##### Alpha: 0.0341  Progress: 31.81%  Million Words/sec: 87.08 #####                                                                                                                                                                                                                                                                                                                                                            |
| 1592870667409 | -------------- End of epoch: 159                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667409 | -------------- End of epoch: 182                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667409 | ##### Alpha: 0.0295  Progress: 41.07%  Million Words/sec: 87.30 #####                                                                                                                                                                                                                                                                                                                                                            |
| 1592870667409 | -------------- End of epoch: 205                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667409 | Using 16 threads for prediction!                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667409 | Validation accuracy: -nan                                                                                                                                                                                                                                                                                                                                                                                                        |
| 1592870667409 | Validation accuracy has not improved for last 1 epochs.                                                                                                                                                                                                                                                                                                                                                                          |
| 1592870667409 | -------------- End of epoch: 207                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667409 | Using 16 threads for prediction!                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667409 | Validation accuracy: -nan                                                                                                                                                                                                                                                                                                                                                                                                        |
| 1592870667409 | Validation accuracy has not improved for last 2 epochs.                                                                                                                                                                                                                                                                                                                                                                          |
| 1592870667410 | -------------- End of epoch: 208                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667410 | Using 16 threads for prediction!                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667410 | Validation accuracy: -nan                                                                                                                                                                                                                                                                                                                                                                                                        |
| 1592870667410 | Validation accuracy has not improved for last 3 epochs.                                                                                                                                                                                                                                                                                                                                                                          |
| 1592870667410 | -------------- End of epoch: 209                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667410 | Using 16 threads for prediction!                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667410 | Validation accuracy: -nan                                                                                                                                                                                                                                                                                                                                                                                                        |
| 1592870667410 | Validation accuracy has not improved for last 4 epochs.                                                                                                                                                                                                                                                                                                                                                                          |
| 1592870667410 | -------------- End of epoch: 211                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667410 | Using 16 threads for prediction!                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667410 | Validation accuracy: -nan                                                                                                                                                                                                                                                                                                                                                                                                        |
| 1592870667410 | Validation accuracy has not improved for last 5 epochs.                                                                                                                                                                                                                                                                                                                                                                          |
| 1592870667410 | -------------- End of epoch: 213                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                                                                                                                                                                                                                                                                       | ...                                                                                                                                                                                                                                                                                                                                                                                               |
| 1592870667410 | Using 16 threads for prediction!                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667410 | Validation accuracy: -nan                                                                                                                                                                                                                                                                                                                                                                                                        |
| 1592870667410 | Validation accuracy has not improved for last 9 epochs.                                                                                                                                                                                                                                                                                                                                                                          |
| 1592870667410 | -------------- End of epoch: 219                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667410 | Using 16 threads for prediction!                                                                                                                                                                                                                                                                                                                                                                                                 |
| 1592870667410 | Validation accuracy: -nan                                                                                                                                                                                                                                                                                                                                                                                                        |
| 1592870667410 | Validation accuracy has not improved for last 10 epochs.                                                                                                                                                                                                                                                                                                                                                                         |
| 1592870667410 | Reached patience. Terminating training.                                                                                                                                                                                                                                                                                                                                                                                          |
| 1592870667410 | Best epoch: 0                                                                                                                                                                                                                                                                                                                                                                                                                    |
| 1592870667410 | Best validation accuracy: 0                                                                                                                                                                                                                                                                                                                                                                                                      |
| 1592870667410 | ##### Alpha: 0.0000  Progress: 100.00%  Million Words/sec: 99.57 #####                                                                                                                                                                                                                                                                                                                                                           |
| 1592870669411 | [06/23/2020 00:04:29 ERROR 139842440988480] Customer Error: Training did not complete successfully! Please check the logs for errors.                                                                                                                                                                                                                                                                                            |
| 1592870669411 | Traceback (most recent call last):   File "/opt/amazon/lib/python2.7/site-packages/blazingtext/train.py", line 75, in main     train_blazing_single(resource_config, train_config, data_config)   File "/opt/amazon/lib/python2.7/site-packages/blazingtext/train_methods.py", line 245, in train_blazing_single     raise exceptions.CustomerError("Training did not complete successfully! Please check the logs for errors.") |
| 1592870669411 | CustomerError: Training did not complete successfully! Please check the logs for errors.                                                                                                                                                                                                                                                                                                                                         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

我相信我的训练和测试数据集已正确预处理,因此我们将不胜感激。太感谢了!

标签: amazon-web-servicesmachine-learningamazon-s3text-classificationamazon-sagemaker

解决方案


推荐阅读