首页 > 解决方案 > Sagemaker sklearn 自定义代码“ValueError:无法将字符串转换为浮点数”

问题描述

我正在使用 sklearn 自定义脚本在 sagemaker 中训练和部署模型。当我尝试调用端点时,出现以下错误:

ERROR - model_featurizer_training - Exception on /invocations [POST] Traceback (most recent call last): File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_functions.py", line 93, in wrapper return fn(*args, **kwargs) File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/serving.py", line 60, in default_input_fn return np_array.astype(np.float32) if content_type in content_types.UTF8_TYPES else np_array ValueError: could not convert string to float: 'female'

我的训练自定义脚本如下:

if __name__=='__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--n_estimators', type=int, default=10)

    parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
    
    args = parser.parse_args()

    input_file = [os.path.join(args.train, file) for file in os.listdir(args.train)]

    raw_data = [pd.read_csv(file, engine='python') for file in input_file]

    train_data = pd.concat(raw_data)
    

    X = train_data.iloc[:, 1:].values

    numerical_processing = make_pipeline(SimpleImputer(strategy="median"))

    categorical_processing = make_pipeline(
        SimpleImputer(strategy="constant", fill_value="missing", add_indicator=True),
        OneHotEncoder(handle_unknown="ignore"),
    )

    preprocessing = make_column_transformer(
        (numerical_processing, list(np.arange(0, 15))),
        (categorical_processing, list(np.arange(15, 17))),
    )

    n_estimators = args.n_estimators

    clf = RandomForestClassifier(n_estimators=n_estimators
                                ,random_state=42)

    full_pipeline = make_pipeline(preprocessing, clf)

    y = train_data.iloc[:, 0]

    full_pipeline.fit(X, y)

    joblib.dump(full_pipeline, os.path.join(args.model_dir, 'sklearn_full_pipeline_model.joblib'))



def input_fn(input_data):

    return np.array([i for i in input_data.split(",")], dtype="object").reshape(1, -1)

def predict_fn(input_data, model):

    return model.predict(input_data)

def model_fn(model_dir):
    
    clf = joblib.load(os.path.join(model_dir, 'sklearn_full_pipeline_model.joblib'))

    return clf

我知道我应该将自定义传递input_fn给我的脚本,以便正确读取我的数据输入,但显然default_input_fn正在调用它。

标签: pythonscikit-learnamazon-sagemaker

解决方案


推荐阅读