python-3.x - 如何获取用户输入并将其传递给预测模型
问题描述
我有一个数据框,我在其上构建了一个预测模型。数据分为训练和测试,我使用了随机森林分类器。
现在,用户传递一个新数据,需要通过这个模型并给出结果。
它是一个文本数据,下面是数据框:
Description Category
Rejoin this domain Network
Laptop crashed Hardware
Installation Error Software
代码 :
############### Feature extraction ##############
countvec = CountVectorizer()
counts = countvec.fit_transform(read_data['Description'])
df = pd.DataFrame(counts.toarray())
df.columns = countvec.get_feature_names()
print(df)
########## Join with original data ##############
df = read_data.join(df)
a = list(df.columns.values)
########## Creating the dependent variable class for "Category" variable ###########
factor = pd.factorize(df['Category'])
df.Category = factor[0]
definitions = factor[1]
print(df.Category.head())
print(definitions)
########## Creating the dependent variable class for "Description" variable ###########
factor = pd.factorize(df['Description'])
df.Description = factor[0]
definitions_1 = factor[1]
print(df.Description.head())
print(definitions_1)
######### Split into Train and Test data #######################
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.80, random_state = 21)
############# Random forest classification model #########################
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 42)
classifier.fit(X_train, y_train)
######### Predicting the Test set results ##############
y_pred = classifier.predict(X_test)
#####Reverse factorize (converting y_pred from 0s,1s and 2s to original class for "Category" ###############
reversefactor = dict(zip(range(3),definitions))
y_test = np.vectorize(reversefactor.get)(y_test)
y_pred = np.vectorize(reversefactor.get)(y_pred)
#####Reverse factorize (converting y_pred from 0s,1s and 2s to original class for "Description" ###############
reversefactor = dict(zip(range(53),definitions_1))
X_test = np.vectorize(reversefactor.get)(X_test)
解决方案
如果您只想对用户数据进行预测,那么我只需加载包含用户数据的新 csv(或其他格式)(确保列与原始训练数据集中的列相同,显然减去因变量)你可以为你的任务提取预测:
user_df = pd.read_csv("user_data.csv")
#insert a preprocessing step if needed to make sure user_df is identical to the original dataset
new_predictions = classifier.predict(user_df)
推荐阅读
- php - 从电话号码中删除国家/地区代码?
- php - 从具有数组列的值给定的多维数组中搜索并获取数组值
- ios - Swift 使用 reduce 从布尔数组中计算分数
- hadoop - 如何使用 MR 引擎优化 Hive 查询?
- javascript - 密码生成器,如何让它返回足够强的密码
- javascript - 在角度 6 中找不到“object”类型的不同支持对象“[object Object]”
- azure-resource-manager - azure.ContainerGroups.ListByResourceGroup 引发资源未找到异常
- javascript - 将 Handlebars 转换为 React 映射语法
- python - 如何在 Django 中为模型编写测试?
- javascript - 在已安装的 App 中打开网站的 url