python-3.x - Groupby year-month 并删除 Python 中所有 NaN 的列
问题描述
import pandas as pd
import numpy as np
np.random.seed(2021)
dates = pd.date_range('20130226', periods=90)
df = pd.DataFrame(np.random.uniform(0, 10, size=(90, 6)), index=dates, columns=['A_values', 'B_values', 'C_values', 'D_values', 'E_values', 'target'])
models = df.columns[df.columns.str.endswith('_values')]
# function to calculate mape
def mape(y_true, y_pred):
y_pred = np.array(y_pred)
return np.mean(np.abs(y_true - y_pred) / np.clip(np.abs(y_true), 1, np.inf),
axis=0)*100
errors = (df.groupby(pd.Grouper(freq='M'))
.apply(lambda x: mape(x[models], x[['target']]))
)
k = 2
n = len(models)
sorted_args = np.argsort(errors, axis=1) < k
res = pd.merge_asof(df[['target']], sorted_args,
left_index=True,
right_index=True,
direction='forward'
)
topk = df[models].where(res[models])
df = df.join(topk.add_suffix('_mape'))
df = df[['target', 'A_values_mape', 'B_values_mape', 'C_values_mape', 'D_values_mape',
'E_values_mape']]
df
出去:
target A_values_mape ... D_values_mape E_values_mape
2013-02-26 1.281624 6.059783 ... 3.126731 NaN
2013-02-27 0.585713 1.789931 ... 7.843101 NaN
2013-02-28 9.638430 9.623960 ... 5.612724 NaN
2013-03-01 1.950960 NaN ... NaN 5.693051
2013-03-02 0.690563 NaN ... NaN 7.322250
... ... ... ... ...
2013-05-22 5.554824 NaN ... NaN 6.803052
2013-05-23 8.440801 NaN ... NaN 2.756443
2013-05-24 0.968086 NaN ... NaN 0.430184
2013-05-25 0.672555 NaN ... NaN 5.461017
2013-05-26 5.273122 NaN ... NaN 6.312104
我怎么能按年-月分组并删除所有NaN
s 的列,然后用 ie., 重命名其余列top_1, top_2, ..., top_k
?
最终的预期结果可能是这样的,如果k=2
:
伪代码:
df2 = df.filter(regex='_mape$').groupby(pd.Grouper(freq='M')).dropna(axis=1, how='all')
df2.columns = ['top_1', 'top_2', ..., 'top_k']
df.join(df2)
正如@Quang Hoang 在上一篇文章中评论的那样,我们可以用它justify_nd
来实现这一点,但我不知道怎么做。提前感谢您的帮助。
编辑:
dates = pd.date_range('20130226', periods=90)
df = pd.DataFrame(np.random.uniform(0, 10, size=(90, 6)), index=dates, columns=['A_values', 'B_values', 'C_values', 'D_values', 'E_values', 'target'])
models = df.columns[df.columns.str.endswith('_values')]
k = 2
n = len(models)
def grpProc(grp):
err = mape(grp[models], grp[['target']])
# sort_args = np.argsort(err) < k
# cols = models[sort_args]
cols = err.nsmallest(k).index
out_cols = [f'top_{i+1}' for i in range(k)]
rv = grp.loc[:, cols]
rv.columns = out_cols
return rv
wrk = df.groupby(pd.Grouper(freq='M')).apply(grpProc)
res = df[['target']].join(wrk)
print(res)
出去:
target top_1 top_2
2013-02-26 1.281624 6.059783 9.972433
2013-02-27 0.585713 1.789931 0.968944
2013-02-28 9.638430 9.623960 6.165247
2013-03-01 1.950960 4.521452 5.693051
2013-03-02 0.690563 5.178144 7.322250
... ... ...
2013-05-22 5.554824 3.864723 6.803052
2013-05-23 8.440801 5.140268 2.756443
2013-05-24 0.968086 5.890717 0.430184
2013-05-25 0.672555 1.610210 5.461017
2013-05-26 5.273122 6.893207 6.312104
解决方案
推荐阅读
- excel - 列出有条件引用的项目的索引数组
- javascript - eval 在 JavaScript 中是如何工作的?为什么 eval('011+011') 不等于 eval('11+11')
- android - Android多个启动器活动?
- javascript - 使用 Ajax GET 和 access_token 的 Mailchimp API 调用抛出 ERR_ABORTED 401
- python - pycharm 使用 pyside2 TypeError:'NoneType' 对象不可调用
- c++ - 如何在循环中添加多个期望调用
- reactjs - 在 ReactJS 中监听来自另一个库的事件?
- android - 如何在android中的onOptionsItemSelected中获取NavController参考
- python-3.x - 如何使用 python 发现 Onvif 设备
- javascript - JS HTML Time value.getHours() 显示 NaN