python - 决策树 - 处理字符串值需要很长时间,但对于浮点值工作正常。如何理解?
问题描述
我正在尝试使用下面的代码构建决策树分类器
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier()
我的数据是
age type_income loan_purpose loan_amount offer
18 Student study 500 yes
18 Student study 600 yes
18 Student study 700 yes
18 Student study 800 yes
. . .
这样的决策树给出了一个错误,说它不能将学生转换为浮点值。
我能做些什么来解决这个问题?我不希望通过预处理手动转换数据以浮动我希望算法本身处理这个问题。是否有任何参数可以自动解决这个问题?
解决方案
sklearn
expects all inputs to be continuous, which is why there is no modules to automatically convert categorical variables to floats. You will have to do some kind of preprocessing manually.
However, there is a rather convenient option: go for onehot encoding of your categorical data (assuming there are not too many different levels for those factorsm in your example type_income
and loan_purpose
). Just converting the strings to floats (eg Student
-> 0
, Employee
->1
) is not adviseable because then sklearn
will assume that there is a relation Student < Employee
.
I suggest you take a look at section 4.3.5 of this documentation page
推荐阅读
- google-cloud-platform - 使用 Cloud Build 和 Source Repositories 连接到 BitBucket 时出现问题
- powershell - 如何在 Powershell 的 Windows 调度程序中发送包含计划任务状态的电子邮件
- r - 用两个 box() 对流体行()进行分区?
- python-3.x - 使用多个布尔索引进行行操作的更有效方法
- ios - 完全重置 CoreData
- javascript - 滚动上的jQuery浮动框
- typo3 - 缺少扩展配置
- asp.net - WCF 休息与 WebAPI
- javascript - Javascript 的导入成本
- php - 循环中的数组 PHP 使用 for 循环遍历数组