python - Pandas 操作 stack() 从 MultiIndex 中删除分类类型
问题描述
我有一个 pandas DataFrame,它的行和列都有一个分类 MultiIndex。但是,将索引从列移动到行的 stack() 操作会剥离 Categorical 类型,即它使用字符串创建新索引。
为什么要这样做?有没有办法阻止它?我正在寻找一种不需要在每次操作后手动重置分类类型的解决方案。
请注意,其他人发现了与 melt() 剥离分类类型的类似问题 [https://stackoverflow.com/questions/64900604/categorical-column-after-melt-in-pandas, https://stackoverflow.com/questions/63138258 /why-is-pandas-melt-messing-with-my-dtypes]
代码
下面的代码说明了这个问题。我在 stack() 操作之前和之后打印出索引的每个级别的类型。Stack 似乎保留了第一层的 Categorical 类型,但剥离了更高层的 Categorical 类型。
import pandas as pd
import numpy as np
# -----------------------------------
# dtype for each level of row index or column index
def get_Index_level_dtypes(df, axis):
I = df.index if axis==0 else df.columns
return [I.get_level_values(i).dtype for i in range(I.nlevels)]
# Print:
# (i) name of DataFrame (or Series)
# (ii) dtype of levels for axis = 0
# (iii) dtype of levels for axis = 1 (if not Series)
def print_Index_level_dtypes(df, S):
print('-'*100,"\n", S, " = \n", df, "\n") # print df name
for i in range(2 if isinstance(df, pd.DataFrame) else 1):
print("Level data type, axis = ",i,":")
for q in get_Index_level_dtypes(df, axis=i):
print(q)
# -----------------------------------
midx = pd.MultiIndex.from_arrays(
[
pd.Categorical(['A1','A1','A2','A2']),
pd.Categorical(['B1','B2','B1','B2']),
pd.Categorical(['C1','C1','C1','C1']),
])
np.random.seed(0)
df = pd.DataFrame(
np.random.randn(2, 4),
columns=midx,
index = pd.Categorical(['Row1','Row2'])
)
# --------------------------------------
print_Index_level_dtypes(df,"Orig")
#
# • Stack one level:
# row index: Keeps categorical type
# col index: strips categorical type (types are "object", ie string)
print_Index_level_dtypes(df.stack(level = [0]), "Stack_level_0") # same behavior for all level=[i]
#
# • Stack multiple levels:
# row index: Keeps 1st categorical type, strips rest
# col index: strips categorical type (types are "object", ie string)
print_Index_level_dtypes(df.stack(level = [0, 1]), "Stack_level_01")
print_Index_level_dtypes(df.stack(level = [0, 1, 2]), "Stack_level_012")
输出
----------------------------------------------------------------------------------------------------
Orig =
A1 A2
B1 B2 B1 B2
C1 C1 C1 C1
Row1 1.764052 0.400157 0.978738 2.240893
Row2 1.867558 -0.977278 0.950088 -0.151357
Level data type, axis = 0 :
category
Level data type, axis = 1 :
category
category
category
----------------------------------------------------------------------------------------------------
Stack_level_0 =
B1 B2
C1 C1
Row1 A1 1.764052 0.400157
A2 0.978738 2.240893
Row2 A1 1.867558 -0.977278
A2 0.950088 -0.151357
Level data type, axis = 0 :
category
category
Level data type, axis = 1 :
object
object
----------------------------------------------------------------------------------------------------
Stack_level_01 =
C1
Row1 A1 B1 1.764052
B2 0.400157
A2 B1 0.978738
B2 2.240893
Row2 A1 B1 1.867558
B2 -0.977278
A2 B1 0.950088
B2 -0.151357
Level data type, axis = 0 :
category
category
object
Level data type, axis = 1 :
object
----------------------------------------------------------------------------------------------------
Stack_level_012 =
Row1 A1 B1 C1 1.764052
B2 C1 0.400157
A2 B1 C1 0.978738
B2 C1 2.240893
Row2 A1 B1 C1 1.867558
B2 C1 -0.977278
A2 B1 C1 0.950088
B2 C1 -0.151357
dtype: float64
Level data type, axis = 0 :
category
category
object
object
解决方案
推荐阅读
- sql - 相同的 SQL 查询在 PostgreSQL 和 NodeJS 中使用 Date 给出不同的输出
- coq - 有人知道 mult 在 coq 中是如何工作的吗?
- javascript - 我无法通过 fetch API 获取 XML 数据 - 被 CORS 策略阻止
- java - 在 Spring Boot REST API 中设置默认响应内容类型
- asp.net-core - 序列在找到元素一次时不包含匹配元素
- angular - FireStore 聊天结构
- c# - Blazor WASM,如何实现自定义 post OIDC 登录逻辑
- php - 在 laravel 8 模型工厂中给出数组错误
- python - 对 MNIST 数据使用 shift()。得到奇怪的结果
- c# - php之间的加密C#我尝试过,但输出仍然不同