首页 > 解决方案 > 尝试使用输入日期范围将前一个交易日纳入数据子集

问题描述

我正在尝试组合一个函数,该函数将创建一个数据子集,其中输入参数是数据框、开始日期和结束日期。我想看到的第一行是开始日期的前一个交易日。例如。从 df 中提取,我输入开始日期 = 2018-01-02 和结束日期 = 2018-09-28,但我想查看的第一行数据是 2017-12-29(最后一天市场开市)。我有这个工作,但想知道是否有更好的方法来做到这一点。

start_dt = "2018-01-02"
end_dt = "2018-09-28"
train_data_price = None

def getRange(df, start_dt, end_dt):

    from datetime import datetime, timedelta
    from pandas.tseries.holiday import USFederalHolidayCalendar
    from pandas.tseries.offsets import CustomBusinessDay

    date_obj = datetime.strptime(start_dt, '%Y-%m-%d')
    US_BUSINESS_DAY = CustomBusinessDay(calendar=USFederalHolidayCalendar()) 
    newdate = date_obj - US_BUSINESS_DAY 
    newdate_str = newdate.strftime('%Y-%m-%d') 
    sub_data = data.loc[newdate_str:end_dt] 

return sub_data

标签: pythondataframefinance

解决方案


这是执行此操作的另一种方法,无需事先知道什么是价格可逃蜡烛。

为示例构建数据框:

import pandas as pd
import numpy as np
dates = pd.date_range("1983-09-01 00:00:00","1985-12-31 23:59:59",freq="1m")
df = pd.DataFrame(index =dates,columns=["Close"])
df['Close'] = [0.183673,0.193673,0.173673,0.163673,0.193673,0.183673,0.16555,0.1993673,0.1282758,0.132758,1.1482758,0.482758,0.482758,0.482758,0.482758,0.482758,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000]

以及解决它的代码

# you want to start one period before start
start = '1984-01-31'
# end there
end = '1984-08-31'
# Using a mask with np.where is a very fast method
# First flag to 1 the period
df['Mask'] = np.where(((df.index>=start)&(df.index<=end)),1,0)
# then flag to 1 a candle before
df['Mask']  = np.where(df.Mask<df.Mask.shift(-1),1,df.Mask)
# filter where flag == 1
sub_df = df[df.Mask==1].drop(['Mask'],axis=1)
# drop everything useless
df = df.drop(['Mask'],axis=1)
sub_df

输出 :

在此处输入图像描述


推荐阅读