首页 > 解决方案 > 根据另一列的日期顺序创建另一列

问题描述

我有以下数据框:

 account_id contract_id date_activated
0   1   AAA 2021-01-05
1   1   ADS 2020-12-12
2   1   ADGD    2021-02-03
3   2   HHA 2021-03-05
4   2   HAKD    2021-03-06
5   3   HADSA   2021-05-01

我想要以下结果:

 account_id contract_id date_activated  Renewal Order
0   1   ADS 2020-12-12  Original
1   1   AAA 2021-01-05  1st
2   1   ADGD    2021-02-03  2nd
3   2   HHA 2021-03-05  Original
4   2   HAKD    2021-03-06  1st
5   3   HADSA   2021-05-01  Original

我要创建的列是“续订订单”。每个账户可以有多个合约。条件基于每个账户 (account_id) 和合约被激活的顺序 (date_activated)。第一个合同将被识别为“原始”,而随后的合同将被识别为“1st”、“2nd”,依此类推。

这是原始数据框的字典:

{'account_id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3},
 'contract_id': {0: 'AAA',
  1: 'ADS',
  2: 'ADGD',
  3: 'HHA',
  4: 'HAKD',
  5: 'HADSA'},
 'date_activated': {0: Timestamp('2021-01-05 00:00:00'),
  1: Timestamp('2020-12-12 00:00:00'),
  2: Timestamp('2021-02-03 00:00:00'),
  3: Timestamp('2021-03-05 00:00:00'),
  4: Timestamp('2021-03-06 00:00:00'),
  5: Timestamp('2021-05-01 00:00:00')}}

这是结果的字典:

{'account_id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3},
 'contract_id': {0: 'ADS',
  1: 'AAA',
  2: 'ADGD',
  3: 'HHA',
  4: 'HAKD',
  5: 'HADSA'},
 'date_activated': {0: Timestamp('2020-12-12 00:00:00'),
  1: Timestamp('2021-01-05 00:00:00'),
  2: Timestamp('2021-02-03 00:00:00'),
  3: Timestamp('2021-03-05 00:00:00'),
  4: Timestamp('2021-03-06 00:00:00'),
  5: Timestamp('2021-05-01 00:00:00')},
 'Renewal Order': {0: 'Original',
  1: '1st',
  2: '2nd',
  3: 'Original',
  4: '1st',
  5: 'Original'}}

标签: pythonpandasdataframenumpy

解决方案


尝试sort_values确保合约的顺序正确 +groupby cumcount以获取每个订单号,然后map使用apply函数将数字转换为所需的字符串值:

def format_order(n):
    if n == 0:
        return 'Original'
    suffix = ['th', 'st', 'nd', 'rd', 'th'][min(n % 10, 4)]
    if 11 <= (n % 100) <= 13:
        suffix = 'th'
    return str(n) + suffix


df = df.sort_values(['account_id', 'date_activated']).reset_index(drop=True)
# apply
df['Renewal Order'] = df.groupby('account_id').cumcount().apply(format_order)

或者

df = df.sort_values(['account_id', 'date_activated']).reset_index(drop=True)
# map
df['Renewal Order'] = df.groupby('account_id').cumcount().map(format_order)
   account_id contract_id date_activated Renewal Order
0           1         ADS     2020-12-12      Original
1           1         AAA     2021-01-05           1st
2           1        ADGD     2021-02-03           2nd
3           2         HHA     2021-03-05      Original
4           2        HAKD     2021-03-06           1st
5           3       HADSA     2021-05-01      Original

推荐阅读