首页 > 解决方案 > Pandas 柱回填减少/增加

问题描述

我有数据框

| ind  |   A  |    B   |
------------------------
| 1.01 |  10  | -1.734 |
| 1.04 |  10  | -1.244 |
| 1.05 |  10  |  0.016 |
| 1.11 |  NaN | -2.737 | <-
| 1.13 |  NaN | -4.232 | <-
| 1.19 |  11  | -3.241 | <=
| 1.20 |  12  | -2.832 |
| 1.21 |  10  | -4.277 |

并希望使用以下一个有效值结尾的递减序列回填 NaN 值

| ind  |   A  |    B   |
------------------------
| 1.01 |  10  | -1.734 |
| 1.04 |  10  | -1.244 |
| 1.05 |  10  |  0.016 |
| 1.11 |  13  | -2.737 | <-
| 1.13 |  12  | -4.232 | <-
| 1.19 |  11  | -3.241 | <=
| 1.20 |  12  | -2.832 |
| 1.21 |  10  | -4.277 |

有没有办法做到这一点?

标签: pandas

解决方案


获取找到 NaN 的位置

positions = df['A'].isna().astype(int)

|  positions |
--------------
|      0     |
|      0     |
|      0     |
|      1     |
|      1     |
|      0     |
|      0     |
|      0     |

然后做反向累积和:

mask = df['A'].isna().astype(int).loc[::-1]
cumSum = mask.cumsum()
posCumSum = (cumSum - cumSum.where(~mask).ffill().fillna(0).astype(int)).loc[::-1]

|  posCumSum |
--------------
|      0     |
|      0     |
|      0     |
|      2     |
|      1     |
|      0     |
|      0     |
|      0     |

将其添加到回填原始列:

df['A'] = df['A'].bfill() + posCumSum

| ind  |   A  |    B   |
------------------------
| 1.01 |  10  | -1.734 |
| 1.04 |  10  | -1.244 |
| 1.05 |  10  |  0.016 |
| 1.11 |  13  | -2.737 | <-
| 1.13 |  12  | -4.232 | <-
| 1.19 |  11  | -3.241 | <=
| 1.20 |  12  | -2.832 |
| 1.21 |  10  | -4.277 |

推荐阅读