python - 分类数据框的 Pandas 从长到宽
问题描述
通常当我们想在 Pandas 中将数据帧从长到宽转换时,我们使用pivot或pivot_table或unstack或groupby,但当有可聚合元素时效果很好。我们如何以相同的方式转换分类数据框?
例子:
d = {'Fruit':['Apple', 'Apple', 'Apple', 'Kiwi'],
'Color1':['Red', 'Yellow', 'Red', 'Green'],
'Color2':['Red', 'Red', 'Green', 'Brown'],'Color3':[np.nan,np.nan,'Red',np.nan]}
pd.DataFrame(d)
Fruit Color1 Color2 Color3
0 Apple Red Red NaN
1 Apple Yellow Red NaN
2 Apple Red Green Red
3 Kiwi Green Brown NaN
应该变成这样:
d = {'Fruit':['Apple','Kiwi'],
'Color1':['Red','Green'],
'Color1_1':['Yellow',np.nan],
'Color1_2':['Red',np.nan],
'Color2':['Red', 'Brown'],
'Color2_1':['Red',np.nan],
'Color2_2':['Green',np.nan],
'Color3':[np.nan,np.nan],
'Color3_1':[np.nan,np.nan],
'Color3_2':['Red',np.nan]
}
pd.DataFrame(d)
Fruit Color1 Color1_1 Color1_2 Color2 Color2_1 Color2_2 Color3 Color3_1 Color3_2
0 Apple Red Yellow Red Red Red Green NaN NaN Red
1 Kiwi Green NaN NaN Brown NaN NaN NaN NaN NaN
解决方案
尝试获取计数,然后cumcount
将其作为列,然后设置列名,使用:groupby
pivot
df = df.assign(idx=df.groupby('Fruit').cumcount()).pivot(index='Fruit',columns='idx')
print(df.set_axis([f'{x}_{y}' if y != 0 else x for x, y in df.columns], axis=1).reset_index())
输出:
Fruit Color1 Color1_1 Color1_2 Color2 Color2_1 Color2_2 Color3 Color3_1 Color3_2
0 Apple Red Yellow Red Red Red Green NaN NaN Red
1 Kiwi Green NaN NaN Brown NaN NaN NaN NaN NaN
完全匹配您的输出。
推荐阅读
- c++ - compiling boost.spirit.karma example, customize_embedded_container.cpp fails
- bash - 根据while循环中的输出函数更改计数器 - 在bash中
- c - Does a C function without any argument and return value require a stack to execute?
- github - How to resolve "fatal: unable to access " error
- javascript - IE11 accept drag & drop of file link from another browser window
- python - 用于文本分类任务的 NLP 数据准备和排序
- reactjs - 从网站 API 获取数据的正确格式
- python - How do I get "new" behavior in python?
- python - 如何更正有关单独 Python 文件的“ModuleNotFoundError”?
- service-worker - 工作箱:`ignoreUrlParametersMatching` 似乎不起作用