首页 > 解决方案 > Python - 识别一年内连续购买次数的最有效方法

问题描述

我会有以下问题:

我有大约 3000 万行数据。数据有六列:

DISTINCT_IRECIPIENTID | ORDERNUMBER | ORDERDATE | ORDERDATE_OF_NEXT_ORDER | RETAINED_OR_NOT

“RETAINED_OR_NOT”分为三类:

我现在想计算消费者连续保留的年数。

因此,如果一个客户下了四个订单,并且连续三个订单的时间差均 <= 1 年,而第三个和第四个之间的时间差 > 1 年,那么前三个订单的值应计为 3最后一个为0。

数据已按 ORDERDATE DESC 的 DISTINCT_IRECIPIENTID 排序

我基本上写了以下代码,但是执行速度极慢。有人对如何使代码更高效有一些想法吗?

def find_consecutive_purchases_in_a_year(input):
    count = 0
    exit_loop = 0
    sub_data = prepared_main_data_backup[prepared_main_data_backup['DISTINCT_IRECIPIENTID']
                                         == input]

    for index, row in sub_data.iterrows():

        if exit_loop == 1:
            return count
        if exit_loop == 0:
            if row['RETAINED_OR_NOT'] == 'retained_for_one_year':
                count += 1
            else:
                exit_loop = 1
    return count


data_test = prepared_main_data_backup

data_test['retain_counter'] = data_test['DISTINCT_IRECIPIENTID'].apply(
    find_consecutive_purchases_in_a_year)

请在下面找到一些示例数据。


    DISTINCT_IRECIPIENTID   TSORDERDATETIME FIRST_TRANS_DATE    ORDER_DATE_AFTER    DIFFERENCE_BETWEEN_ORDERS   RETAINED_OR_NOT Output
    1   2017-04-24-09.33.21.000000  2017-04-24-09.33.21.000000          only one lifetime purchase  0
    2   2017-04-24-09.35.16.000000  2017-04-24-09.35.16.000000          only one lifetime purchase  0
    3   2017-04-27-14.45.48.000000  2017-04-27-14.45.48.000000  2017-04-29-14.53.46.000000  2   retained_for_one_year   2
    3   2017-04-29-14.53.46.000000  2017-04-27-14.45.48.000000  2017-05-10-09.06.25.000000  11  retained_for_one_year   2
    3   2017-05-10-09.06.25.000000  2017-04-27-14.45.48.000000  2018-09-22-05.54.07.000000  500 next_purchase_but_not_retained  0
    3   2018-09-22-05.54.07.000000  2017-04-27-14.45.48.000000  2020-09-12-19.12.59.000000  721 next_purchase_but_not_retained  0
    3   2020-09-12-19.12.59.000000  2017-04-27-14.45.48.000000  2020-09-14-11.49.33.000000  2   retained_for_one_year   2
    3   2020-09-14-11.49.33.000000  2017-04-27-14.45.48.000000  2021-06-08-07.18.42.000000  267 retained_for_one_year   2
    4   2017-04-24-09.35.27.000000  2017-04-24-09.35.27.000000  2017-04-30-12.00.14.000000  6   retained_for_one_year   1
    4   2017-04-30-12.00.14.000000  2017-04-24-09.35.27.000000  2018-06-18-09.15.23.000000  414 next_purchase_but_not_retained  0




非常感谢任何输入!

标签: python-3.xpandasdataframe

解决方案


推荐阅读