python - 当 test_set 和 train_set 具有不同的唯一值时如何获取虚拟变量？

问题描述

train_set 是：

  type
0    a
1    b
2    c
3    d
4    e

如果我使用 pd.get_dummies，我将得到 5 列：

   type_a  type_b  type_c  type_d  type_e
0       1       0       0       0       0
1       0       1       0       0       0
2       0       0       1       0       0
3       0       0       0       1       0
4       0       0       0       0       1

测试集是：

  type
0    a
1    b
2    c
3    d

如果我使用 pd.get_dummies，我只会得到 4 列：

   type_a  type_b  type_c  type_d
0       1       0       0       0
1       0       1       0       0
2       0       0       1       0
3       0       0       0       1

我希望它是：

   type_a  type_b  type_c  type_d type_e
0       1       0       0       0      0
1       0       1       0       0      0
2       0       0       1       0      0
3       0       0       0       1      0

标签： pythondataframedummy-variable

您可以尝试reindex所有所需的columns和fill_value=0：

pd.get_dummies(test_set).reindex(
    ["type_a", "type_b", "type_c", "type_d", "type_e"], axis=1, fill_value=0)

输出

#    type_a  type_b  type_c  type_d  type_e
# 0       1       0       0       0       0
# 1       0       1       0       0       0
# 2       0       0       1       0       0
# 3       0       0       0       1       0

python - 当 test_set 和 train_set 具有不同的唯一值时如何获取虚拟变量？

问题描述

解决方案

推荐阅读