python - How to prevent LabelEncoder from sorting label values?
问题描述
Scikit LabelEncoder is showing some puzzling behavior in my Jupyter Notebook, as in:
from sklearn.preprocessing import LabelEncoder
le2 = LabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))
prints ['one' 'one' 'one' 'zero' 'zero' 'zero']
.
This is odd, shouldn't it print ['zero' 'zero' 'zero' 'one' 'one' 'one']
? Then I tried
le3 = LabelEncoder()
le3.fit(['one', 'zero'])
print (le3.inverse_transform([0, 0, 0, 1, 1, 1]))
which also prints ['one' 'one' 'one' 'zero' 'zero' 'zero']
. Perhaps there was an alphabetization thing happening? Next, I tried
le4 = LabelEncoder()
le4.fit(['nil', 'one'])
print (le4.inverse_transform([0, 0, 0, 1, 1, 1]))
which prints ['nil' 'nil' 'nil' 'one' 'one' 'one']
!
I've spent several hours on this. FWIW, the example in the documentation works as expected so I suspect there is a flaw in how I expect inverse_transform
to work. Part of my research included this and this.
In case it is relevant, I'm using iPython 7.7.0, numpy 1.17.3 and scikit-learn version 0.21.3.
解决方案
事情是 LabelEncoder.fit() 总是返回排序的数据。那是因为它使用np.unique
Here's the source code
我想做你想做的唯一方法是创建你自己的fit
方法并覆盖来自 LabelEncoder 的原始方法。
您只需要重用链接中给出的现有代码,这是示例:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import column_or_1d
class MyLabelEncoder(LabelEncoder):
def fit(self, y):
y = column_or_1d(y, warn=True)
self.classes_ = pd.Series(y).unique()
return self
le2 = MyLabelEncoder()
le2.fit(['zero', 'one'])
print (le2.inverse_transform([0, 0, 0, 1, 1, 1]))
给你:
['zero' 'zero' 'zero' 'one' 'one' 'one']
推荐阅读
- c# - 在 C# 中动态地从 JSON 中删除节点
- mysql - 如何使我的选择语句确定性地仅匹配我的数据集的 1/n?
- python - Copying and deleting files from google storage but getting python error as quote_from_bytes() expected bytes
- sql-server - 将一系列不同单引号的字符转换为nvarchar
- android-studio - Android 应用程序未执行。请帮我解决这个问题
- swiftui - 包含 Text SwiftUI 时让 VStack 缩小到 minWidth
- time-complexity - 2^1000 是常数函数或指数函数
- java - Android 版本 4.1.2 中的 SSL 握手中止
- sql - 如何在 SQL Query 中的 XML 标记之间选择值
- python - 如何在熊猫的merge_asof中保留重复的“on”列值行