python - Sagemaker sklearn 自定义代码“ValueError:无法将字符串转换为浮点数”
问题描述
我正在使用 sklearn 自定义脚本在 sagemaker 中训练和部署模型。当我尝试调用端点时,出现以下错误:
ERROR - model_featurizer_training - Exception on /invocations [POST] Traceback (most recent call last): File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_functions.py", line 93, in wrapper return fn(*args, **kwargs) File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/serving.py", line 60, in default_input_fn return np_array.astype(np.float32) if content_type in content_types.UTF8_TYPES else np_array ValueError: could not convert string to float: 'female'
我的训练自定义脚本如下:
if __name__=='__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--n_estimators', type=int, default=10)
parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
args = parser.parse_args()
input_file = [os.path.join(args.train, file) for file in os.listdir(args.train)]
raw_data = [pd.read_csv(file, engine='python') for file in input_file]
train_data = pd.concat(raw_data)
X = train_data.iloc[:, 1:].values
numerical_processing = make_pipeline(SimpleImputer(strategy="median"))
categorical_processing = make_pipeline(
SimpleImputer(strategy="constant", fill_value="missing", add_indicator=True),
OneHotEncoder(handle_unknown="ignore"),
)
preprocessing = make_column_transformer(
(numerical_processing, list(np.arange(0, 15))),
(categorical_processing, list(np.arange(15, 17))),
)
n_estimators = args.n_estimators
clf = RandomForestClassifier(n_estimators=n_estimators
,random_state=42)
full_pipeline = make_pipeline(preprocessing, clf)
y = train_data.iloc[:, 0]
full_pipeline.fit(X, y)
joblib.dump(full_pipeline, os.path.join(args.model_dir, 'sklearn_full_pipeline_model.joblib'))
def input_fn(input_data):
return np.array([i for i in input_data.split(",")], dtype="object").reshape(1, -1)
def predict_fn(input_data, model):
return model.predict(input_data)
def model_fn(model_dir):
clf = joblib.load(os.path.join(model_dir, 'sklearn_full_pipeline_model.joblib'))
return clf
我知道我应该将自定义传递input_fn
给我的脚本,以便正确读取我的数据输入,但显然default_input_fn
正在调用它。
解决方案
推荐阅读
- javascript - Date higer than 3 months ago not wotking moongose
- graphql - Remove null results from a array that can contain nullable values in GraphQL
- jq - jq:我可以在代码本身中使用参数的名称吗?
- sql - 在 postgresql 中使用 group by 和 order
- regex - Nginx rewrite rule is not working if hash in the url
- c++ - Retrieving binary tree root data
- c++ - Using map function to map to user menu options to specific functions
- c++ - How to take vector as input in C++?
- flutter - 例外:无法构建插件analog_clock
- google-cloud-platform - Google Cloud Healthcare API 不支持 PlanDefintion/$apply 和 Activity Definition/$apply