首页 > 解决方案 > 基于多个 DateTime 比较创建组

问题描述

我正在尝试创建一个基于列并使用基于一个日期列与其他三个日期列的比较的值填充它。

DataFrame 的示例df如下所示。显示的所有日期都已转换为pd.to_datetime,这导致了许多NaT值,因为个人没有进步

    1st_date     2nd_date        3rd_date     action_date
    2015-10-05   NaT             NaT          2015-12-03 
    2015-02-27   2015-03-14      2015-03-15   2015-04-08 
    2015-03-07   2015-03-27      2015-03-28   2015-03-27 
    2015-01-05   2015-01-20      2015-01-21   2015-05-20 
    2015-01-05   2015-01-20      2015-01-21   2015-09-16 
    2015-05-23   2015-06-18      2015-06-19   2015-07-01 
    2015-03-03   NaT             NaT          2015-07-23 
    2015-03-03   NaT             NaT          2015-11-14 
    2015-06-05   2015-06-19      2015-06-20   2015-10-24 
    2015-10-08   2015-10-21      2015-10-22   2015-12-22 

我正在尝试创建第五列,其中包含该action_date列与前三个日期列的比较结果(或组) 1st_date, 2nd_date, 3rd_date

我正在尝试action_group使用将每个日期分配给组的字符串填充第五列。

潜在功能(和预期输出)的伪代码如下:if action_date > 1st_date and < 2nd_date then action_group = '1st_action_group'

和需要相同的比较action_date,这将导致列中的输出。2nd_date3rd_date2nd_action_groupaction_group

最后,如果action_date大于3rd_dateaction_group将被赋值为3rd_action_group

预期输出的示例如下所示。

1st_date     2nd_date        3rd_date     action_date  action_group
2015-10-05   NaT             NaT          2015-12-03   1st_action_group
2015-02-27   2015-03-14      2015-03-15   2015-04-08   3rd_action_group
2015-03-07   2015-03-27      2015-03-28   2015-03-27   2nd_action_group
2015-01-05   2015-01-20      2015-01-21   2015-05-20   3rd_action_group
2015-01-05   2015-01-20      2015-01-21   2015-09-16   3rd_action_group
2015-05-23   2015-06-18      2015-06-19   2015-07-01   3rd_action_group
2015-03-03   NaT             NaT          2015-07-23   1st_action_group
2015-03-03   NaT             NaT          2015-11-14   1st_action_group
2015-06-05   2015-06-19      2015-06-20   2015-10-24   3rd_action_group
2015-10-08   2015-10-21      2015-10-22   2015-12-22   3rd_action_group

任何人都可以提供的任何帮助将不胜感激。

标签: pythonpandaspython-datetime

解决方案


df['action_group'] = np.where(df['action_date']>df['3rd_date'], 
                              '3rd_action_group', 
                               np.where(((df['action_date'] >= df['2nd_date'])&(df['action_date']<df['3rd_date'])), 
                                          '2nd_action_group', 
                                          '1st_action_group'))

您只需堆叠 2 个 np.where's 即可获得所需的结果。

    1st_date    2nd_date    3rd_date    action_date action_group
0   2015-10-05     NaT          NaT     2015-12-03  1st_action_group
1   2015-02-27  2015-03-14  2015-03-15  2015-04-08  3rd_action_group
2   2015-03-07  2015-03-27  2015-03-28  2015-03-27  2nd_action_group
3   2015-01-05  2015-01-20  2015-01-21  2015-05-20  3rd_action_group
4   2015-01-05  2015-01-20  2015-01-21  2015-09-16  3rd_action_group
5   2015-05-23  2015-06-18  2015-06-19  2015-07-01  3rd_action_group
6   2015-03-03     NaT          NaT     2015-07-23  1st_action_group
7   2015-03-03     NaT          NaT     2015-11-14  1st_action_group
8   2015-06-05  2015-06-19  2015-06-20  2015-10-24  3rd_action_group
9   2015-10-08  2015-10-21  2015-10-22  2015-12-22  3rd_action_group

推荐阅读