python - 熊猫插值条件
问题描述
我有一个这样的数据框
import pandas as pd
import numpy as np
df = pd.DataFrame({'date':pd.date_range(start='13/11/2021', periods=11),
'a': [1, np.nan, np.nan, np.nan, np.nan, 2, np.nan, np.nan, np.nan, np.nan, 3],
'b': [4, np.nan, np.nan, np.nan, np.nan, 6, np.nan, np.nan, np.nan, np.nan, 7],
}).set_index('date')
a b
date
2021-11-13 1.0 4.0
2021-11-14 NaN NaN
2021-11-15 NaN NaN
2021-11-16 NaN NaN
2021-11-17 NaN NaN
2021-11-18 2.0 6.0
2021-11-19 NaN NaN
2021-11-20 NaN NaN
2021-11-21 NaN NaN
2021-11-22 NaN NaN
2021-11-23 3.0 7.0
如何将其线性插值到两个非 nan 值之间的 n% 间隔,然后用上限填充其余部分。
两个非 nan 值之间的间隔将在整个数据帧中保持不变。
现在例如,n = 0.5
a b
date
2021-11-13 1.000000 4.000000 << # original value --------------
2021-11-14 1.333333 4.666667 |
2021-11-15 1.666667 5.333333 | 50% linearly
2021-11-16 2.000000 6.000000 <- linear interpolation upto here | interpolated rest are
2021-11-17 2.000000 6.000000 | filled.
2021-11-18 2.000000 6.000000 << # original value (upper bound)---
2021-11-19 2.333333 6.333333
2021-11-20 2.666667 6.666667
2021-11-21 3.000000 7.000000 <- linear interpolation upto here
2021-11-22 3.000000 7.000000
2021-11-23 3.000000 7.000000 << # original value (upper bound)
解决方案
我认为没有为此目的的熊猫功能。你必须创建你的:
def interpol_segment(df, r):
nan_idx = df[pd.isna(df.a) | pd.isna(df.b)].index # nan rows
consecutive_nan = [] # a list of range (list)
date_inter = [nan_idx[0]] # range (list) of consecutive dates
for i, date in enumerate(nan_idx[1:], start=1):
if date - nan_idx[i-1] == pd.Timedelta(1, unit="day"):
date_inter.append(date)
else:
consecutive_nan.append(date_inter)
date_inter = [date]
consecutive_nan.append(date_inter) # appending last interval
# dates to be filled with upper bound:
capped_date = [x[i] for x in consecutive_nan for i,_ in enumerate(x) if i >= int(r*len(x))]
df_capped = df.loc[capped_date]
# interpoling without the capped rows:
df_inter = df.drop(capped_date).interpolate()
# reassembling and filling with bfill:
return pd.concat([df_inter,df_capped]).sort_index().bfill()
print(interpol_segment(df, 0.5))
输出:
a b
date
2021-11-13 1.000000 4.000000
2021-11-14 1.333333 4.666667
2021-11-15 1.666667 5.333333
2021-11-16 2.000000 6.000000
2021-11-17 2.000000 6.000000
2021-11-18 2.000000 6.000000
2021-11-19 2.333333 6.333333
2021-11-20 2.666667 6.666667
2021-11-21 3.000000 7.000000
2021-11-22 3.000000 7.000000
2021-11-23 3.000000 7.000000
推荐阅读
- c# - Visual C# 从资源加载图像
- scala - Scala 列出了“var”的用法
- python - signal.connect(self) UnicodeEncodeError:
- wildfly - 通过配置禁用 Google Analytics for Wildfly
- android - 用于 Android 项目的 gitlab-ci 中的 Jacoco 覆盖率报告报告“未指定类文件”
- python-3.x - 使用 lmfit 拟合两个变量
- python - 如何从views.py中的Django模板获取变量中的值?
- javascript - 如何在钩子中调用钩子?(或替代方案,因为这不能完成:D)
- docker - 无法获得根目录 (/var/lib/docker) 的完整路径:规范路径指向文件“/usr/bin/docker”
- swift - 静态@state 字符串字段