首页 > 解决方案 > 按日期和组旋转计数的 Pandas 数据框

问题描述

我有以下数据框:


                            id_x  id_y
department         date               
0                  09/2017     1   NaN
1                  01/2018   149   NaN
                   01/2019   112   4.0
                   02/2018   103   1.0
                   02/2019    78   NaN
...                          ...   ...
799                09/2017    57   2.0
                   10/2017    64   3.0
                   11/2017    80   NaN
                   12/2017    79   2.0

这是从数据库数据构建的数据框的结果,其中运行了一系列计数并按部门和日期分组。

我需要按部门和日期汇总的数据,但是,我希望日期跨越顶部,然后是 id 计数。

我想要的输出是这样的:

                              9/2017      10/2017
                            id_x   id_y  id_x   id_y
department 
0                              1   NaN    NaN   NaN
1                            NaN   NaN    NaN   NaN
...                          ...   ...    ...   ...
799                           57   2.0     64   3.0

我试过删除索引、重新索引、融合数据框和旋转数据框。我可以让数据框按“id_x”和“id_y”后跟日期排序,但是,这不是一个优雅的解决方案,因为它可能为每个 id 重复 36 个日期。

我一直在参考以下文档: https ://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html

并测试了以下解决方案的变体(除其他外):

new_df.melt(new_df, col_level=0, id_vars=['department'], value_vars=['id_x','id_y'])
new_df.reset_index().pivot_table(index="department", columns="date") #I've also tried "date" as values and in brackets outside the parenthesis

标签: pythonpython-3.xpandasdataframe

解决方案


重新创建了您的数据,但我认为这可以满足您的需求?如果日期字段实际上是您的 df 中的日期时间,则排序将按日期升序显示数据框。

df=pd.DataFrame({'department':[0,1,1,1,1,799,799,799,799],'date':['09/2017','01/2018','01/2019','02/2018','02/2019','09/2017','10/2017','11/2017','12/2017'],'id_x':[1,149,112,103,78,57,64,80,79],'id_y':[np.NaN,np.NaN,4.0,1.0,np.NaN,2.0,3.0,np.NaN,2.0]})


df=df.set_index('department')
df2=df.pivot(columns='date',values=['id_x','id_y'])        

df3=df2.swaplevel(axis=1)
df3.sort_index(axis=1, level=0, inplace=True)

输出:

date       01/2018      01/2019      02/2018  ... 10/2017 11/2017      12/2017     
              id_x id_y    id_x id_y    id_x  ...    id_y    id_x id_y    id_x id_y
department                                    ...                                  
0              NaN  NaN     NaN  NaN     NaN  ...     NaN     NaN  NaN     NaN  NaN
1            149.0  NaN   112.0  4.0   103.0  ...     NaN     NaN  NaN     NaN  NaN
799            NaN  NaN     NaN  NaN     NaN  ...     3.0    80.0  NaN    79.0  2.0

推荐阅读