首页 > 解决方案 > 在按 id 分组的排序数据框中获取第一个值

问题描述

我有这样的数据框:

ID       geoloc         starttime
1       jjbjbjn         2020-01-01 02:20:12
1       123eh2ue        2020-01-01 02:10:10
1       6tyfgxghsvc     2020-01-02 03:06:12
1       6tyfgxghsv1     2020-01-02 05:06:12
1       6tyfgxghsv5     2020-01-05 05:06:12
1       6tyfgxghsv2     2020-01-05 06:06:12
2       86ghgx          2021-01-12 03:12:35
2       87ghguygg       2021-01-12 03:09:35
2       87ghguygg       2021-01-13 05:17:35
2       87ghguygg       2021-01-13 03:17:35
2       87ghguyg1       2021-01-19 03:17:35
2       87ghguyg6       2021-01-19 05:17:35

我想要的结果数据框是:

ID    geoloc        starttime
1    123eh2ue        2020-01-01 02:10:10
1    6tyfgxghsvc     2020-01-02 03:06:12
1    6tyfgxghsv5     2020-01-05 05:06:12
2    87ghguygg       2021-01-12 03:09:35
2    87ghguygg       2021-01-13 03:17:35
2    87ghguyg6       2021-01-19 05:17:35

我怎样才能以有效的方式实现这一目标?

试过了

output_df = df.groupby(['ID','starttime']).agg('first')

标签: python-3.xpandas

解决方案


在预期的输出中是每个日期的唯一值,所以使用Series.dt.datewithDataFrame.sort_valuesDataFrame.drop_duplicates

df['starttime'] = pd.to_datetime(df['starttime'])

df['new'] = df['starttime'].dt.date

df = df.sort_values(by=['ID','new','starttime']).drop_duplicates(subset=['ID','new'])
print (df)
    ID       geoloc           starttime         new
1    1     123eh2ue 2020-01-01 02:10:10  2020-01-01
2    1  6tyfgxghsvc 2020-01-02 03:06:12  2020-01-02
4    1  6tyfgxghsv5 2020-01-05 05:06:12  2020-01-05
7    2    87ghguygg 2021-01-12 03:09:35  2021-01-12
9    2    87ghguygg 2021-01-13 03:17:35  2021-01-13
10   2    87ghguyg1 2021-01-19 03:17:35  2021-01-19

最后删除new列:

df = df.drop('new', axis=1)

如果需要每月使用唯一值Series.dt.to_period,但输出不同:

df['starttime'] = pd.to_datetime(df['starttime'])

df['new'] = df['starttime'].dt.to_period('m')

df = df.sort_values(by=['ID','new','starttime']).drop_duplicates(subset=['ID','new'])
print (df)
   ID     geoloc           starttime      new
1   1   123eh2ue 2020-01-01 02:10:10  2020-01
7   2  87ghguygg 2021-01-12 03:09:35  2021-01

推荐阅读