python - 基于单个特征集的分类精度
问题描述
我正在尝试根据预先指定的标签对数据进行分类。
有两列,如下所示:
room_class room_cluster
Standard single sea view Standard
Deluxe twin Single Deluxe
Suite Superior room ocean view Suite
Superior Double twin Superior
Deluxe Double room Deluxe
如上面标签集中的 room_cluster 所示。
代码片段如下:
le = preprocessing.LabelEncoder()
datar = df
#### Separate data into feature and Labels
x = datar.room_class
y = datar.room_cluster
#### Using Label encoder to change string onto 'int'
le.fit(x)
addv = le.transform(x)
asb = addv.reshape(-1,1)
#### Splitting into training and testing set adn then using Knn
x_train,x_test,y_train,y_test=train_test_split(asb,y,test_size=0.40)
classifier=neighbors.KNeighborsClassifier(n_neighbors=3)
classifier.fit(x_train,y_train)
predictions = classifier.predict(x_test)
#### Checking the accuracy
print(accuracy_score(y_test,predictions))
我在测试数据上获得的准确度只有 78%,代码中是否有问题阻碍了准确度水平。
如何使用此模型来预测自定义功能,例如:
输入:'Suite Single sea view'
输出:'Suite'
输入:'Superior Suite twin'
输出:'Superior'
解决方案
import random
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import numpy as np
##Based on your data
initial_room=["Standard single sea view","Deluxe twin Single","Suite Superior room ocean view","Superior Double twin","Deluxe Double room"]
##Based on your data created 100 data points
##Its repeating
room_class=[initial_room[random.randint(0,len(initial_room)-1)] for i in range(100)]
##Based on room_cluster
initial_cluster=["Standard","Deluxe","Suite","Superior"]
##Find intersection between room_class and room_cluster the matching word is the Y_Label
room_cluster=[''.join(list(set(each_room.split()).intersection(set(initial_cluster)))[0]) for each_room in room_class]
##Helps to embed
embedding={}
index=0
##For each unique word in the total room_class assign a unique number.
for each_room in room_class:
for each_word in each_room.split():
if each_word not in embedding:
embedding[each_word]=index
index+=1
##Find max_len of the room name
max_len=max([len(i.split()) for i in room_class])
##Needed for embedding the matrix
embedded_rooms=[]
##For each room in room_class
for each_room in room_class:
embedded_room=[]
for each_word in each_room.split():
##Each word assign that unique number
embedded_room.append(embedding[each_word])
#Get the length of the row
room_len=len(embedded_room)
##If it is length max_len pad it with -1
##Single for embedding I have already used 0 so I cant use it
while(room_len<max_len):
embedded_room.append(-1)
room_len+=1
##Append it to embedded rooms
embedded_rooms.append(embedded_room)
Y=[]
##Embed Y based on same technique
for each_cluster in room_cluster:
Y.append(embedding[each_cluster])
X=np.array(embedded_rooms)
##Apply KNN
classifier = KNeighborsClassifier(n_neighbors=3)
classifier.fit(X,Y)
##Data for testing goes within this list
test=["Single Standard"]
test_label=["Standard"]
embed_tests=[]
##Convert the test to embedding
#Use the same embedding
for each_test in test:
embed_test=[]
for each_word in each_test.split():
embed_test.append(embedding[each_word])
##Again Padding the data
n=len(embed_test)
while(n<max_len):
embed_test.append(-1)
n+=1
embed_tests.append(embed_test)
#Predict the X_test
X_test=np.array(embed_tests)
predictions = classifier.predict(X_test)
##Convert class_labels to encoding
embed_test_label=[]
for each_class in test_label:
embed_test_label.append(embedding[each_class])
##Print out the accuracy
print(accuracy_score(embed_test_label,predictions))
我已经大致编码了它,所以请随身携带。
参考:
推荐阅读
- c# - 如何将我的用户输入带到另一种方法?
- c++ - 使用原生 SDK 执行多线程任务是否会严重影响 UI 组件?
- r - 同时调整 Tidymodels 的配方和模型参数
- mysql - 当我单击提交按钮时,它不会存储到数据库中,但它会不断重定向回表单页面
- c# - 在 Visual Studio 桌面 MFC 类库项目中使用 WinUI 3
- wordpress - 图像未显示在自定义元素中
- html - IE 上不显示 Html 选项卡图标
- javascript - React 函数组件中的 this 关键字
- typescript - Typedoc:如何记录由客户端和服务器组成的应用程序
- javascript - 十字路口观察者在 chrome 中的 iphone 上不起作用