python - Pandas：如何为 groupby 获取具有最大时间戳的行

问题描述

给定一个DataFrame，我想为具有最大时间戳的行创建一个新的DataFrame，用于列组合

组合：类别、修订、类型、子类型

sub_type 可能/没有值（但 None 是其唯一性的一部分）

根据上述内容，我不会有重复项（时间戳没有联系）

    action  category   YYYYMM      timestamp      sub_type       type 
0     buy      A       202002           4            None        apple 
1     sell     A       202002           5            None        apple 
2     buy      A       202002           4            green       apple 
3     buy      A       202002           4            red         apple 
4     sell     A       202002           3            red         apple 
5     sell     A       202002           1            None        orange
6     sell     B       202002           6            None        apple

上述 DataFrame 的结果如下所示：

    action  category  revision      timestamp      sub_type      type 
0     sell     A       202002           5            None        apple 
1     buy      A       202002           4            green       apple 
2     buy      A       202002           4            red         apple 
3     sell     A       202002           1            None        orange
4     sell     B       202002           6            None        apple

基本上 - 我想要属性组合的最后一个动作

标签： pythonpandas

所以我们需要fillna在这里使用，因为 None == None 将返回 True。之后我们可以sort_values做drop_duplicates

out = df.sort_values('timestamp').fillna('None').\ 
            drop_duplicates(['category','sub_type','YYYYMM','type'],keep='last').\
               sort_index()

out
Out[128]: 
  action category  YYYYMM  timestamp sub_type    type
1   sell        A  202002          5     None   apple
2    buy        A  202002          4    green   apple
3    buy        A  202002          4      red   apple
5   sell        A  202002          1     None  orange
6   sell        B  202002          6     None   apple

python - Pandas：如何为 groupby 获取具有最大时间戳的行

问题描述

解决方案

推荐阅读