首页 > 解决方案 > 与不为零的更近的邻居重复值,直到值改变

问题描述

我有一个包含大约 3000 万条记录的数据框,按 ID 分类并按日期打开。

我需要在以前的日期数中分配一列最近行的值,我已经尝试使用 windows 参数使用滚动函数,但不幸的是我必须分配的数字是可变的(我的意思是,它可以是12, 6, 7, ...),并且此参数不接受来自同一 df 的变量。

工作正常,但 12 数字可以是可变的:

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=12)
df['REPEAT_AMOUNT'] = df['AMOUNT'].rolling(window, min_periods=1).sum()

我尝试通过使用 df 中的变量来更改 window_size 的数量:

df['REPEAT_AMOUNT'] = df['AMOUNT'].rolling(window=df['VARIABLE_DISTRIBUTION'].astype(int), min_periods=1).sum()

但我收到以下错误:

ValueError: window must be an integer

此外,第一个记录的月份差异可能与后续记录的日期数字不同。

这是我目前拥有的:

+----+-----------------------+------------+---------+
| ID | VARIABLE_DISTRIBUTION |    DATE    |  AMOUNT |
+----+-----------------------+------------+---------+
|  1 |                    12 | 30-04-2021 |       - |
|  1 |                    12 | 31-05-2021 |       - |
|  1 |                    12 | 30-06-2021 |       - |
|  1 |                    12 | 31-07-2021 |     100 |
|  1 |                    12 | 31-08-2021 |       - |
|  1 |                    12 | 30-09-2021 |       - |
|  1 |                    12 | 31-10-2021 |       - |
|  1 |                    12 | 30-11-2021 |       - |
|  1 |                    12 | 31-12-2021 |       - |
|  1 |                    12 | 31-01-2022 |       - |
|  1 |                    12 | 28-02-2022 |       - |
|  1 |                    12 | 31-03-2022 |       - |
|  1 |                    12 | 30-04-2022 |       - |
|  1 |                    12 | 31-05-2022 |       - |
|  1 |                    12 | 30-06-2022 |       - |
|  1 |                    12 | 31-07-2022 |     150 |
|  2 |                     8 | 30-04-2021 |       - |
|  2 |                     8 | 31-05-2021 |     200 |
|  2 |                     8 | 30-06-2021 |       - |
|  2 |                     8 | 31-07-2021 |       - |
|  2 |                     8 | 31-08-2021 |       - |
|  2 |                     8 | 30-09-2021 |       - |
|  2 |                     8 | 31-10-2021 |       - |
|  2 |                     8 | 30-11-2021 |       - |
|  2 |                     8 | 31-12-2021 |       - |
|  2 |                     8 | 31-01-2022 |     300 |
|  2 |                     8 | 28-02-2022 |       - |
|  2 |                     8 | 31-03-2022 |       - |
|  2 |                     8 | 30-04-2022 |       - |
|  2 |                     8 | 31-05-2022 |       - |
|  2 |                     8 | 30-06-2022 |       - |
|  2 |                     8 | 31-07-2022 |       - |
+----+-----------------------+------------+---------+

这就是我想要的:

+----+-----------------------+------------+----------+----------------+
| ID | VARIABLE_DISTRIBUTION |    DATE    |  AMOUNT  |  REPEAT_AMOUNT |
+----+-----------------------+------------+----------+----------------+
|  1 |                    12 | 30-04-2021 |       -  |            100 |
|  1 |                    12 | 31-05-2021 |       -  |            100 |
|  1 |                    12 | 30-06-2021 |       -  |            100 |
|  1 |                    12 | 31-07-2021 |     100  |            100 |
|  1 |                    12 | 31-08-2021 |       -  |            150 |
|  1 |                    12 | 30-09-2021 |       -  |            150 |
|  1 |                    12 | 31-10-2021 |       -  |            150 |
|  1 |                    12 | 30-11-2021 |       -  |            150 |
|  1 |                    12 | 31-12-2021 |       -  |            150 |
|  1 |                    12 | 31-01-2022 |       -  |            150 |
|  1 |                    12 | 28-02-2022 |       -  |            150 |
|  1 |                    12 | 31-03-2022 |       -  |            150 |
|  1 |                    12 | 30-04-2022 |       -  |            150 |
|  1 |                    12 | 31-05-2022 |       -  |            150 |
|  1 |                    12 | 30-06-2022 |       -  |            150 |
|  1 |                    12 | 31-07-2022 |     150  |            150 |
|  2 |                     8 | 30-04-2021 |       -  |            200 |
|  2 |                     8 | 31-05-2021 |     200  |            200 |
|  2 |                     8 | 30-06-2021 |       -  |            300 |
|  2 |                     8 | 31-07-2021 |       -  |            300 |
|  2 |                     8 | 31-08-2021 |       -  |            300 |
|  2 |                     8 | 30-09-2021 |       -  |            300 |
|  2 |                     8 | 31-10-2021 |       -  |            300 |
|  2 |                     8 | 30-11-2021 |       -  |            300 |
|  2 |                     8 | 31-12-2021 |       -  |            300 |
|  2 |                     8 | 31-01-2022 |     300  |            300 |
|  2 |                     8 | 28-02-2022 |       -  |              - |
|  2 |                     8 | 31-03-2022 |       -  |              - |
|  2 |                     8 | 30-04-2022 |       -  |              - |
|  2 |                     8 | 31-05-2022 |       -  |              - |
|  2 |                     8 | 30-06-2022 |       -  |              - |
|  2 |                     8 | 31-07-2022 |       -  |              - |
+----+-----------------------+------------+----------+----------------+

我感谢有关窗口参数或任何其他类型解决方案的任何帮助。

对不起非母语英语

非常感谢。

标签: pythonpandastime-seriesrolling-computation

解决方案


IIUC 用途groupby bfill

df['REPEAT_AMOUNT'] = df.groupby('ID')['AMOUNT'].bfill()

df

    ID  VARIABLE_DISTRIBUTION        DATE AMOUNT REPEAT_AMOUNT
0    1                     12  30-04-2021    NaN           100
1    1                     12  31-05-2021    NaN           100
2    1                     12  30-06-2021    NaN           100
3    1                     12  31-07-2021    100           100
4    1                     12  31-08-2021    NaN           150
5    1                     12  30-09-2021    NaN           150
6    1                     12  31-10-2021    NaN           150
7    1                     12  30-11-2021    NaN           150
8    1                     12  31-12-2021    NaN           150
9    1                     12  31-01-2022    NaN           150
10   1                     12  28-02-2022    NaN           150
11   1                     12  31-03-2022    NaN           150
12   1                     12  30-04-2022    NaN           150
13   1                     12  31-05-2022    NaN           150
14   1                     12  30-06-2022    NaN           150
15   1                     12  31-07-2022    150           150
16   2                      8  30-04-2021    NaN           200
17   2                      8  31-05-2021    200           200
18   2                      8  30-06-2021    NaN           300
19   2                      8  31-07-2021    NaN           300
20   2                      8  31-08-2021    NaN           300
21   2                      8  30-09-2021    NaN           300
22   2                      8  31-10-2021    NaN           300
23   2                      8  30-11-2021    NaN           300
24   2                      8  31-12-2021    NaN           300
25   2                      8  31-01-2022    300           300
26   2                      8  28-02-2022    NaN           NaN
27   2                      8  31-03-2022    NaN           NaN
28   2                      8  30-04-2022    NaN           NaN
29   2                      8  31-05-2022    NaN           NaN
30   2                      8  30-06-2022    NaN           NaN
31   2                      8  31-07-2022    NaN           NaN

推荐阅读