首页 > 解决方案 > 在时间序列中汇总 $ 数量的首次客户

问题描述

我有一个数据框,它汇总了客户在特定月份在其帐户中的金额。如果客户没有钱,金额就是 0。数据框如下所示:

            A       B       C       D       E        F   
11/30/2015  0       1000    0       0       5000     0   
12/31/2015  2000    1000    0       3000    5000     2000
1/31/2016   2000    0       0       3000    5000     2000
2/29/2016   2000    2000    4000    3000    5000     2000
3/31/2016   2000    2000    4000    0       10000    2000
4/30/2016   0       2000    4000    0       10000    0   
5/31/2016   0       2000    4000    0       10000    0   

当客户首次上线时,他们会从 0 变为特定月份的名义金额(或者他们从 11 月的名义金额开始)。因此,当特定客户拥有他们的第一个名义金额时,这就是他们“新”的月份。

我想在数据框的末尾添加一列,汇总“新”客户的金额。

我已经能够计算出“新”客户的数量(参见下面的代码),但我无法更改代码来对这些值求和。

def new_customer(column):
    return column[-1] and not any(column[:-1])
table['new_loans'] = table.iloc[:, len(table.columns)].expanding().apply(new_customer).sum(axis=1).astype(int)

生成的数据框应如下所示:

            A       B       C       D       E        F       New_Customers 
11/30/2015  0       1000    0       0       5000     0       6000 
12/31/2015  2000    1000    0       3000    5000     2000    7000
1/31/2016   2000    0       0       3000    5000     2000    0
2/29/2016   2000    2000    4000    3000    5000     2000    4000
3/31/2016   2000    2000    4000    0       10000    2000    0
4/30/2016   0       2000    4000    0       10000    0       0
5/31/2016   0       2000    4000    0       10000    0       0

标签: pythonpandas

解决方案


利用:

df['New_Customers'] = df.where(df.ne(0).cumsum().eq(1)).sum(axis=1)
print (df)
               A     B     C     D      E     F  New_Customers
11/30/2015     0  1000     0     0   5000     0         6000.0
12/31/2015  2000  1000     0  3000   5000  2000         7000.0
1/31/2016   2000     0     0  3000   5000  2000            0.0
2/29/2016   2000  2000  4000  3000   5000  2000         4000.0
3/31/2016   2000  2000  4000     0  10000  2000            0.0
4/30/2016      0  2000  4000     0  10000     0            0.0
5/31/2016      0  2000  4000     0  10000     0            0.0

说明

首先将DataFrame.ne( !=) 与进行比较0

print (df.ne(0))
                A      B      C      D     E      F
11/30/2015  False   True  False  False  True  False
12/31/2015   True   True  False   True  True   True
1/31/2016    True  False  False   True  True   True
2/29/2016    True   True   True   True  True   True
3/31/2016    True   True   True  False  True   True
4/30/2016   False   True   True  False  True  False
5/31/2016   False   True   True  False  True  False

布尔掩码的累积总和DataFrame.cumsum

print (df.ne(0).cumsum())
            A  B  C  D  E  F
11/30/2015  0  1  0  0  1  0
12/31/2015  1  2  0  1  2  1
1/31/2016   2  2  0  2  3  2
2/29/2016   3  3  1  3  4  3
3/31/2016   4  4  2  3  5  4
4/30/2016   4  5  3  3  6  4
5/31/2016   4  6  4  3  7  4

1与- first比较:DataFrame.eq (==)1

print (df.ne(0).cumsum().eq(1))
                A      B      C      D      E      F
11/30/2015  False   True  False  False   True  False
12/31/2015   True  False  False   True  False   True
1/31/2016   False  False  False  False  False  False
2/29/2016   False  False   True  False  False  False
3/31/2016   False  False  False  False  False  False
4/30/2016   False  False  False  False  False  False
5/31/2016   False  False  False  False  False  False

将值替换为NaNs DataFrame.where

print (df.where(df.ne(0).cumsum().eq(1)))
                 A       B       C       D       E       F
11/30/2015     NaN  1000.0     NaN     NaN  5000.0     NaN
12/31/2015  2000.0     NaN     NaN  3000.0     NaN  2000.0
1/31/2016      NaN     NaN     NaN     NaN     NaN     NaN
2/29/2016      NaN     NaN  4000.0     NaN     NaN     NaN
3/31/2016      NaN     NaN     NaN     NaN     NaN     NaN
4/30/2016      NaN     NaN     NaN     NaN     NaN     NaN
5/31/2016      NaN     NaN     NaN     NaN     NaN     NaN

最后sum每列:

print (df.where(df.ne(0).cumsum().eq(1)).sum(axis=1))
11/30/2015    6000.0
12/31/2015    7000.0
1/31/2016        0.0
2/29/2016     4000.0
3/31/2016        0.0
4/30/2016        0.0
5/31/2016        0.0
dtype: float64

推荐阅读