首页 > 解决方案 > 如何使用 convert_coreml 转换自定义管道(分类 get_dummies)?

问题描述

我正在尝试将自定义 sklearn 管道保存为 onnx 模型,但在此过程中出现错误。

示例代码:

from sklearn.preprocessing import OneHotEncoder
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline

from sklearn import svm
from winmltools import convert_coreml

import copy
from IPython.display import display
# https://github.com/pandas-dev/pandas/issues/8918

class MyEncoder(TransformerMixin):

    def __init__(self, columns=None):
        self.columns = columns

    def transform(self, X, y=None, **kwargs):
        return pd.get_dummies(X, dtype=np.float, columns=['ID'])

    def fit(self, X, y=None, **kwargs):
        return self

# data
X = pd.DataFrame([[100, 1.1, 3.1], [200, 4.1, 5.1], [100, 4.1, 2.1]], columns=['ID', 'X1', 'X2'])
Y = pd.Series([3, 2, 4])

# check transform
df = MyEncoder().transform(X)
display(df)

# create pipeline
pipe = Pipeline( steps=[('categorical', MyEncoder()), ('classifier', svm.SVR())] )
print(type(pipe), MyEncoder().transform(X).dtypes, '\n')

# prepare models
svm_toy  = svm.SVR()
svm_toy.fit(X,Y)
pipe_toy = copy.deepcopy(pipe).fit(X, Y)

# save onnx

# no problem here
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ] 
onx = convert_sklearn(svm_toy, initial_types=initial_type  )

# something goes wrong...
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ] 
onx = convert_sklearn(pipe_toy, initial_types=initial_type  )

简单的转换很顺利:

# no problem here
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ] 
onx = convert_sklearn(svm_toy, initial_types=initial_type  )

但是管道转换失败:

# something goes wrong...
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ] 
onx = convert_sklearn(pipe_toy, initial_types=initial_type  )

出现以下错误:

MissingShapeCalculator: Unable to find a shape calculator for type ''.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.

我是否缺少自定义管道和get_dummies

标签: pythonscikit-learnonnx-coreml

解决方案


自定义转换器,即 sklearn 不支持的转换器,需要额外的信息才能被 ONNX 识别。您需要为您的变压器编写形状和转换器功能,然后使用这两个附加功能注册您的变压器。在文档中查看更多信息。


推荐阅读