首页 > 解决方案 > 在不同月份选择相同的交易

问题描述

我想选择或保留使用 Pandas 数据框至少发生 3 个月的相同交易描述(TRNDESCR)。我尝试了一些代码,但它没有按预期工作。

下面给出了示例数据集:

    ACNO TIME                   TRNCD   TRNDESCR                                        TRNAMT
0   85   2018-12-19 20:40:00    109     Ib Transfer To Phoutthalom Syh Account No:123   -20000
1   85   2018-12-19 21:15:00    109     Ib Transfer To Phoutthalom Syh Account No:123   -25000
2   85   2018-12-20 15:30:00    109     Ib Transfer To Thongsavath Pra Account No:124   -10000
3   85   2018-12-22 12:30:00    209     Bil Payment                                     -500
4   85   2018-12-25 15:34:00    109     Ib Transfer To Phoutthalom Syh Account No:123   -60000
5   85   2019-01-22 12:30:00    209     Bil Payment                                     -501
6   85   2019-01-23 12:50:00    109     Ib Transfer To Sarah Account No:199             -3000
7   85   2019-01-31 08:59:00    109     Ib Transfer To Thongsavath Pra Account No:124   -650000
8   85   2019-02-02 12:30:00    109     Ib Transfer To Sarah Account No:199             -600
9   85   2019-02-03 15:02:00    109     Ib Transfer To Phoutthalom Syh Account No:123   -60000
10  85   2019-02-04 15:21:00    109     Ib Transfer To Thongsavath Pra Account No:124   -863000
11  85   2019-02-05 15:30:00    209     Bil Payment                                     -600

以下是预期的结果:

    ACNO TIME                   TRNCD   TRNDESCR                                        TRNAMT
0   85   2018-12-20 15:30:00    109     Ib Transfer To Thongsavath Pra Account No:124   -10000
1   85   2018-12-22 12:30:00    209     Bil Payment                                     -500
2   85   2019-01-22 12:30:00    209     Bil Payment                                     -501
3   85   2019-01-31 08:59:00    109     Ib Transfer To Thongsavath Pra Account No:124   -650000
4   85   2019-02-04 15:21:00    109     Ib Transfer To Thongsavath Pra Account No:124   -863000
5   85   2019-02-05 15:30:00    209     Bil Payment                                     -600

标签: pythonpandasdataframe

解决方案


选择被视为指标的列,就像您给出的示例一样,它是 TRNDESCR 和 TIME,因为您希望将“月份”作为过滤器。然后您删除重复项并按 TRNDESCR 分组,然后根据月份计算事务发生的次数。

例子:

import pandas as pd

df = pd.DataFrame()
df['TIME'] = ["2018-12-19", "2018-12-20", "2019-01-20", "2019-02-06",
             "2018-12-18", "2018-12-02", "2019-01-03", "2019-02-06"]
df['TRNDESCR'] = ["ib1", "ib2", "ib2", "ib2",
                 "ib2", "ib3", "ib3", "ib3"]
df['ACNO'] = 85


df['TIME'] = pd.to_datetime(df['TIME'])
df['MONTH'] = df['TIME'].dt.month

count_month = df[['MONTH', 'TRNDESCR']].drop_duplicates(['MONTH', 'TRNDESCR'], keep="last").groupby('TRNDESCR')['MONTH'].count()

df[df['TRNDESCR'].isin(count_month[count_month >= 3].index)]

TIME    TRNDESCR    ACNO    MONTH
1   2018-12-20  ib2     85  12
2   2019-01-20  ib2     85  1
3   2019-02-06  ib2     85  2
4   2018-12-18  ib2     85  12
5   2018-12-02  ib3     85  12
6   2019-01-03  ib3     85  1
7   2019-02-06  ib3     85  2


推荐阅读