python - 如何使用 convert_coreml 转换自定义管道(分类 get_dummies)?
问题描述
我正在尝试将自定义 sklearn 管道保存为 onnx 模型,但在此过程中出现错误。
示例代码:
from sklearn.preprocessing import OneHotEncoder
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
from sklearn import svm
from winmltools import convert_coreml
import copy
from IPython.display import display
# https://github.com/pandas-dev/pandas/issues/8918
class MyEncoder(TransformerMixin):
def __init__(self, columns=None):
self.columns = columns
def transform(self, X, y=None, **kwargs):
return pd.get_dummies(X, dtype=np.float, columns=['ID'])
def fit(self, X, y=None, **kwargs):
return self
# data
X = pd.DataFrame([[100, 1.1, 3.1], [200, 4.1, 5.1], [100, 4.1, 2.1]], columns=['ID', 'X1', 'X2'])
Y = pd.Series([3, 2, 4])
# check transform
df = MyEncoder().transform(X)
display(df)
# create pipeline
pipe = Pipeline( steps=[('categorical', MyEncoder()), ('classifier', svm.SVR())] )
print(type(pipe), MyEncoder().transform(X).dtypes, '\n')
# prepare models
svm_toy = svm.SVR()
svm_toy.fit(X,Y)
pipe_toy = copy.deepcopy(pipe).fit(X, Y)
# save onnx
# no problem here
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ]
onx = convert_sklearn(svm_toy, initial_types=initial_type )
# something goes wrong...
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ]
onx = convert_sklearn(pipe_toy, initial_types=initial_type )
简单的转换很顺利:
# no problem here
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ]
onx = convert_sklearn(svm_toy, initial_types=initial_type )
但是管道转换失败:
# something goes wrong...
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ]
onx = convert_sklearn(pipe_toy, initial_types=initial_type )
出现以下错误:
MissingShapeCalculator: Unable to find a shape calculator for type ''.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.
我是否缺少自定义管道和get_dummies
?
解决方案
自定义转换器,即 sklearn 不支持的转换器,需要额外的信息才能被 ONNX 识别。您需要为您的变压器编写形状和转换器功能,然后使用这两个附加功能注册您的变压器。在文档中查看更多信息。