python - 如何通过在左侧和右侧的平均值之间插值来填充缺失值?
问题描述
我有一个数据表,其中每个组中都可能出现缺失值(单个和连续)。我想将它们填写如下:计算序列中第一个 NaN 左侧的 3 个值的平均值,然后计算序列中最后一个 NaN 右侧的 3 个值的平均值,然后进行插值这些平均值之间的 NaN。
+-------+-------+
| group | value |
+-------+-------+
| 1 | 1 |
+-------+-------+
| 1 | 1 |
+-------+-------+
| 1 | 2 |
+-------+-------+
| 1 | 3 |
+-------+-------+
| 1 | 4 |
+-------+-------+
| 1 | NaN |
+-------+-------+
| 1 | NaN |
+-------+-------+
| 1 | 3 |
+-------+-------+
| 1 | 6 |
+-------+-------+
| 1 | 4 |
+-------+-------+
| 1 | 3 |
+-------+-------+
| 1 | NaN |
+-------+-------+
| 2 | NaN |
+-------+-------+
| 2 | NaN |
+-------+-------+
| 2 | 1 |
+-------+-------+
| 2 | 2 |
+-------+-------+
| 2 | 3 |
+-------+-------+
| 2 | 4 |
+-------+-------+
| 2 | NaN |
+-------+-------+
| 2 | NaN |
+-------+-------+
| 2 | NaN |
+-------+-------+
| 2 | 6 |
+-------+-------+
| 2 | 8 |
+-------+-------+
| 2 | 9 |
+-------+-------+
重现上述数据帧的代码
nan = np.nan
d = {'group': {0: 1,
1: 1,
2: 1,
3: 1,
4: 1,
5: 1,
6: 1,
7: 1,
8: 1,
9: 1,
10: 1,
11: 1,
12: 2,
13: 2,
14: 2,
15: 2,
16: 2,
17: 2,
18: 2,
19: 2,
20: 2,
21: 2,
22: 2,
23: 2},
'value': {0: 1.0,
1: 1.0,
2: 2.0,
3: 3.0,
4: 4.0,
5: nan,
6: nan,
7: 3.0,
8: 6.0,
9: 4.0,
10: 3.0,
11: nan,
12: nan,
13: nan,
14: 1.0,
15: 2.0,
16: 3.0,
17: 4.0,
18: nan,
19: nan,
20: nan,
21: 6.0,
22: 8.0,
23: 9.0}}
df = pd.DataFrame(d)
预期输出:
d = {'group': {0: 1,
1: 1,
2: 1,
3: 1,
4: 1,
5: 1,
6: 1,
7: 1,
8: 1,
9: 1,
10: 1,
11: 1,
12: 2,
13: 2,
14: 2,
15: 2,
16: 2,
17: 2,
18: 2,
19: 2,
20: 2,
21: 2,
22: 2,
23: 2},
'value': {0: 1.0,
1: 1.0,
2: 2.0,
3: 3.0,
4: 4.0,
5: 3.44444444,
6: 3.88888889,
7: 3.0,
8: 6.0,
9: 4.0,
10: 3.0,
11: 4.333333,
12: 2.0,
13: 2.0,
14: 1.0,
15: 2.0,
16: 3.0,
17: 4.0,
18: 4.166667,
19: 5.333333,
20: 6.500000,
21: 6.0,
22: 8.0,
23: 9.0}}
是否可以在熊猫中做到这一点,而不使用循环?
解决方案
IIUC,这是一种方法:
df['updated_values'] = (
df.groupby('group')
.apply(
lambda x: x['value'].fillna(
x['value']
.rolling(3)
.mean()
.bfill()
.where(~x['value'].isna())
.interpolate()
.bfill()
.ffill()
)
).values
)
输出:
group value updated_values
0 1 1.0 1.000000
1 1 1.0 1.000000
2 1 2.0 2.000000
3 1 3.0 3.000000
4 1 4.0 4.000000
5 1 NaN 3.444444
6 1 NaN 3.888889
7 1 3.0 3.000000
8 1 6.0 6.000000
9 1 4.0 4.000000
10 1 3.0 3.000000
11 1 NaN 4.333333
12 2 NaN 2.000000
13 2 NaN 2.000000
14 2 1.0 1.000000
15 2 2.0 2.000000
16 2 3.0 3.000000
17 2 4.0 4.000000
18 2 NaN 4.166667
19 2 NaN 5.333333
20 2 NaN 6.500000
21 2 6.0 6.000000
22 2 8.0 8.000000
23 2 9.0 9.000000
推荐阅读
- django - Django DRF当文件为空时更新它会给出错误 - 选择文件时它可以工作
- azure-devops - 如何在 Azure DevOps 管道中使用复杂变量
- swift - 本地保存数据(IOS)
- r - gt table - 单元格中的换行符
- reactjs - 在表单提交时导航到另一个组件并将数据传递给导航组件反应 Js
- r - 使用 ini -fle 连接 R 与 odbc
- html - 顶栏和标题的CSS覆盖问题
- java - Spring Boot 看不到我的 Mapstruct 映射器
- mongodb - $group阶段多个字段的mongodb排序结果
- javascript - 保存从 Promise 返回的数据并将其传递给反应中的另一个组件