首页 > 解决方案 > 无法在烧瓶应用程序中使用保存为泡菜文件的机器学习模型

问题描述

我正在尝试使用烧瓶构建 API,它将使用我保存的 ML 模型。该模型是使用 sklearn、管道和辅助函数 (lemmatizer_preprocessing) 构建的,并使用 joblib 以 pickle 格式存储现在,当我尝试使用该模型来构建我的烧瓶应用程序时,它给出了属性错误

AttributeError: module '__main__' has no attribute 'lemmatizer_preprocessing'

用于构建模型并保存它的代码

def lemmatizer_preprocessing(mess):
    nopunc = [char for char in mess if char not in string.punctuation]
    nopunc = ''.join(nopunc)
    nopunc = [lemmatizer.lemmatize(word) for word in nopunc.split()]
    nopunc = [word for word in nopunc if word.lower() not in stopwords.words('english')]
    temp =  ' '.join(nopunc).strip()
    return re.sub(r'[^\w]', ' ', temp)
....
....
....
pipeline1 = Pipeline([
    ('bow', CountVectorizer(analyzer=lemmatizer_preprocessing)),
    ('classifier', MultinomialNB()),
    ...
])
....
....
....
joblib.dump(pipeline1, 'filename.pkl')

现在,每当我尝试导入此模型时,它都会显示上述错误。我知道它显示错误,因为lemmatizer_preprocessingjoblib 需要该函数才能正确反序列化模型,但由于某种原因,该函数没有被注册。我正在使用两个文件来划分我的烧瓶应用程序 app.py 和 predictor.py 的代码app.py

from flask import Flask, jsonify, request, make_response
from predictor import predict_jihad
app = Flask(__name__, instance_relative_config=True)
predict_jihad = predict_jihad()

@app.errorhandler(404)
def not_found(error):
    return make_response(jsonify({'error': 'Not found'}), 404)
@app.errorhandler(500)
def not_found(error):
    return make_response(jsonify({'error': 'Not found'}), 500)

@app.route('/')
def index():
    text = request.args.get('text')
    if type(text) is str and len(text)!=0:
        return jsonify({"probability":predict_jihad.get_prediction(text)})
    else:
        return jsonify({"error":"check passed value"})

app.run(debug=False)

代码predictor.py

from nltk.corpus import stopwords
import string
from sklearn.feature_extraction.text import TfidfTransformer ,CountVectorizer
from sklearn.pipeline import Pipeline
from sklearn.naive_bayes import MultinomialNB
import re
from nltk.stem import WordNetLemmatizer
import joblib

class predict_jihad:
    def __init__(self):
        super().__init__()
        lemmatizer = WordNetLemmatizer()
        file = './filename.pkl'
    def deserialize(self):
        def lemmatizer_preprocessing(mess):
            lemmatizer = WordNetLemmatizer()
            nopunc = [char for char in mess if char not in string.punctuation]
            nopunc = ''.join(nopunc)
            nopunc = [self.lemmatizer.lemmatize(word) for word in nopunc.split()]
            nopunc = [word for word in nopunc if word.lower() not in stopwords.words('english')]
            temp =  ' '.join(nopunc).strip()
            return re.sub(r'[^\w]', ' ', temp)
        model = joblib.load(open('filename.pkl','rb'))
        return model

    def get_prediction(self,text):
        model = self.deserialize()
        return model.predict_proba([text])[0][1]

所有其他文件都在原地,并且没有注册其他错误。请提供解决方案。

标签: pythonmachine-learningflaskscikit-learn

解决方案


管道泄漏问题是我们在一个项目中定义了 lemmatizer_preprocessing 类,当它试图加载模型时,它不知道如何解释腌制管道的那些部分。

你可以做:

if __name__ == "__main__":
    normalizer = lemmatizer_preprocessing()
    lemmatizer_preprocessing.__module__ = "model_maker"
    normalizer.save("normalizer.pkl")

模块主无属性

python-pickling-and-dealing-with-attributeerror-module-object-has-no-attribute-thing


推荐阅读