首页 > 解决方案 > 如何修复 sklearn .predict 给定的错误

问题描述

我需要创建一个项目,从二手车店抓取数据,将它们存储在数据库中,并使用机器学习来预测汽车价格。

在抓取和收集有关 10000 辆汽车的信息(像这样['Ford F-150',135972,5,'black',0,2,37500])之后,我将数据拆分为 x ,y 数据集

x =['Ford F-150',135972,5,'black',0,2]
y = [37500]

现在我尝试了不同的方法来训练我的模型,但都没有奏效,而且它们都给了我不同的错误

for model, mileage, age, color, accident, owners,price in data:
    x.append([model,mileage,age,color,accident,owners])
    y.append(price)


def get_text(x):
    t_data =[]
    for i in x:
        t_data.append([i[0],i[3]])
    return t_data
def get_numberic(x):
    n_data = []
    for i in x:
        n_data.append([i[1],i[2],i[4],i[5]])
    return n_data

t = ['Ford F-150',135972,5,'black',0,2]
le = Lab
le = OneHotEncoder()
tree = tree.DecisionTreeRegressor()
le.fit(get_text(x))
X_data = list(zip(le.fit_transform(get_text(x)),get_numberic(x)))

tree.fit(X_data, y)

ans = tree.predict([le.transform(get_text(t)), get_numberic(t)])#print(ans)

对于 OneHotEncoder

value error:setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (9808, 2) + inhomogeneous part.

对于 LableEncoder

ValueError: y should be a 1d array, got an array of shape (9808, 2) instead.

并具有更多 ML 风格的编码

transfomer_numeric = FunctionTransformer(get_numberic)
transformer_text = FunctionTransformer(get_text)


pipeline = Pipeline([
    ('features', FeatureUnion([
            ('numeric_features', Pipeline([
                ('selector', transfomer_numeric)
            ])),
             ('text_features', Pipeline([
                ('selector', transformer_text),
                ('vec', TfidfVectorizer(analyzer='word'))
            ]))
         ])),
    ('clf', tree.DecisionTreeRegressor())
])


kfold = StratifiedKFold(n_splits=7)

rf_model = GridSearchCV(estimator=pipeline, cv=kfold, n_jobs=-1,
                         return_train_score=True, verbose=1)
rf_model.fit(x, y)

这给任何改变带来了各种各样的错误我怎么能解决这个问题?

标签: pythonmachine-learningscikit-learn

解决方案


推荐阅读