首页 > 解决方案 > 在熊猫中旋转具有多个值的时间序列数据

问题描述

我有几个 snspost的 url,并且每天都在跟踪它的readCount,likeCountcommentCount

这是我使用熊猫得到的数据框:

     post_url       nickname    date_key  readCount  likeCount  commentCount
5       a_url          user1  2020-06-12     2874.0        194           NaN
4       a_url          user1  2020-06-13     4030.0        208          48.0
6       a_url          user1  2020-06-14        NaN        220          48.0
7       a_url          user1  2020-06-15        NaN        223          48.0
0       b_url          user2  2020-06-13    16882.0        295          88.0
2       b_url          user2  2020-06-14        NaN        296          88.0
3       b_url          user2  2020-06-15        NaN        299          88.0

我想要达到的结果是(请不要介意实际值,因为它们是即兴创作的):

      post_url      nickname          type  2020-06-12  2020-06-13    2020-06-14  2020-06-15
5        a_url         user1     readCount      2874.0      3074.0           NaN         NaN
4        a_url         user1     likeCount      4030.0      4334.0        4888.0      7463.0
6        a_url         user1  commentCount         NaN         220          48.0        59.0
1        b_url         user2     readCount         NaN         194           NaN         NaN 
3        b_url         user2     likeCount         NaN         208          88.0       493.0
2        b_url         user2  commentCount         NaN         220          53.0       292.0

请注意,每个post都有不同的date_keys 子集,我的目标是将所有现有date_key的 s 组合成列。

我试过搜索关于这个主题的 SO,但我没有找到完全相同的用例。

你能建议我能做到这一点的方法吗?谢谢你。

标签: pythonpandasdataframe

解决方案


用于DataFrame.meltunpivot,然后DataFrame.pivot_table用于聚合的一般解决方案,mean如果可能,每列重复post_url,nickname,type,date_key

df = (df.melt(['post_url','nickname','date_key'], var_name='type')
        .dropna(subset=['value'])
        .pivot_table(index=['post_url','nickname','type'], 
                     columns='date_key', 
                     values='value', 
                     aggfunc='mean')
        .rename_axis(None, axis=1)
        .reset_index())
print (df)
  post_url nickname          type  2020-06-12  2020-06-13  2020-06-14  \
0    a_url    user1  commentCount         NaN        48.0        48.0   
1    a_url    user1     likeCount       194.0       208.0       220.0   
2    a_url    user1     readCount      2874.0      4030.0         NaN   
3    b_url    user2  commentCount         NaN        88.0        88.0   
4    b_url    user2     likeCount         NaN       295.0       296.0   
5    b_url    user2     readCount         NaN     16882.0         NaN   

   2020-06-15  
0        48.0  
1       223.0  
2         NaN  
3        88.0  
4       299.0  
5         NaN  

另一个想法,如果不需要聚合DataFrame.stackand Series.unstack

df = (df.set_index(['post_url','nickname','date_key'])
        .stack()
        .unstack(2)
        .rename_axis(index=['post_url','nickname','type'], columns=None)
        .reset_index()
        )
print (df)
  post_url nickname          type  2020-06-12  2020-06-13  2020-06-14  \
0    a_url    user1     readCount      2874.0      4030.0         NaN   
1    a_url    user1     likeCount       194.0       208.0       220.0   
2    a_url    user1  commentCount         NaN        48.0        48.0   
3    b_url    user2     readCount         NaN     16882.0         NaN   
4    b_url    user2     likeCount         NaN       295.0       296.0   
5    b_url    user2  commentCount         NaN        88.0        88.0   

   2020-06-15  
0         NaN  
1       223.0  
2        48.0  
3         NaN  
4       299.0  
5        88.0  

推荐阅读