首页 > 解决方案 > 如何正确编写与 dask 分开的时间序列?

问题描述

我正在尝试为 Dask 实施https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html。我现在得到的最简单的是如下所示:

def train_test_split_time_series(dd_feature_009a013a, dd_price_solely_y, time_series_gap = pd.Timedelta("1d")+pd.Timedelta("5m"),split_ratio = 0.8):
    begin_time = dd_feature_009a013a.index.min().compute()
    end_time  = dd_feature_009a013a.index.max().compute()
    split_train_end = pd.to_datetime((begin_time.timestamp()* (1-split_ratio) + end_time.timestamp()*split_ratio)*1e9)
    split_test_start = split_train_end + time_series_gap
    return dd_feature_009a013a.loc[:split_train_end],dd_feature_009a013a.loc[split_test_start:],dd_price_solely_y.loc[:split_train_end],dd_price_solely_y.loc[split_test_start:]

只是想知道有没有更好的方法来编写时间序列拆分在 dask 中?如果是这样,我该怎么写?

标签: dask

解决方案


推荐阅读