python - Python/ML: Which methods to use for Multiclass Classification for Product Categorization?
问题描述
In a pickle...
I have a dataset with >100,000 observations; datasets' columns include CustomerID, VendorID, ProductID and CatNMap. Here is what it looks like:
As you can see values represented in first 3 columns (CustomerID, VendorID, ProductID) represent unique numerical mapped values and would make no sense if represented on x,y plane (which eliminates use of a lot of Classification methods); last column has strings with categories assigned by customers. Now, here is the part that I do not understand and not sure how to approach...
Goal: is to predict CatNMap values in the future for customers, however as I see it the features I have here are not useful, is that true? Now if they are, what method can I use as CatNMap column has >7,000 unique values; also, how would any method deal with categorizing future items if let's say for the same product there are 2 or more different categories assigned by different customers? Do I need to Implement NN for this one?
All answers are appreciated!
解决方案
据我了解,您的目标是CatNMap
根据前 3 列(您的输入数据作为特征)预测(您的输出数据)。
正如您之前所说, ( CustomerID, VendorID, ProductID
) 是 3 个分类变量,这意味着它们可能具有的值与数量无关,而是与类别有关。所以两个连续的值可能与它们的实际含义无关。正如我所看到的,您的 output 也会发生同样的情况CatNMap
。
话虽如此,有几种方法可以处理分类变量。根据我的经验,对于您的问题,我会为您的所有数据尝试一个热编码(CustomerID, VendorID, ProductID, CatNMap
)。更重要的是,如果您发现可能的话,也许值得尝试使用嵌入而ProductID, CatNMap
不是 OneHotEncoding。
至于使用哪种算法,绝对值得尝试训练随机森林和多层感知器模型,并在调整后进行比较。
推荐阅读
- javascript - 不推荐使用 mozImageSmoothingEnabled
- android - 从 ExpandableListView 中的复选框获取结果时出错
- qt - Qml SwipView当前索引在改变滑动方向时发生变化
- python - 如何以非递减顺序 O(N) 时间复杂度对已排序的数组进行排序?
- c - 用户输入超过 1 个项目并计算总价
- ruby-on-rails - 如何获取每个元素的嵌套属性到 Ruby on Rails 中的数组?
- c# - 带有时区 C# 的时间戳格式
- node.js - 无服务器框架:未找到无服务器错误功能:
- javascript - 在 FlatList(native) 上呈现数据的错误 - React Native
- java - 出现异常如何自动重启springboot应用