python - 尝试使用输入日期范围将前一个交易日纳入数据子集
问题描述
我正在尝试组合一个函数,该函数将创建一个数据子集,其中输入参数是数据框、开始日期和结束日期。我想看到的第一行是开始日期的前一个交易日。例如。从 df 中提取,我输入开始日期 = 2018-01-02 和结束日期 = 2018-09-28,但我想查看的第一行数据是 2017-12-29(最后一天市场开市)。我有这个工作,但想知道是否有更好的方法来做到这一点。
start_dt = "2018-01-02"
end_dt = "2018-09-28"
train_data_price = None
def getRange(df, start_dt, end_dt):
from datetime import datetime, timedelta
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
date_obj = datetime.strptime(start_dt, '%Y-%m-%d')
US_BUSINESS_DAY = CustomBusinessDay(calendar=USFederalHolidayCalendar())
newdate = date_obj - US_BUSINESS_DAY
newdate_str = newdate.strftime('%Y-%m-%d')
sub_data = data.loc[newdate_str:end_dt]
return sub_data
解决方案
这是执行此操作的另一种方法,无需事先知道什么是价格可逃蜡烛。
为示例构建数据框:
import pandas as pd
import numpy as np
dates = pd.date_range("1983-09-01 00:00:00","1985-12-31 23:59:59",freq="1m")
df = pd.DataFrame(index =dates,columns=["Close"])
df['Close'] = [0.183673,0.193673,0.173673,0.163673,0.193673,0.183673,0.16555,0.1993673,0.1282758,0.132758,1.1482758,0.482758,0.482758,0.482758,0.482758,0.482758,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000]
以及解决它的代码
# you want to start one period before start
start = '1984-01-31'
# end there
end = '1984-08-31'
# Using a mask with np.where is a very fast method
# First flag to 1 the period
df['Mask'] = np.where(((df.index>=start)&(df.index<=end)),1,0)
# then flag to 1 a candle before
df['Mask'] = np.where(df.Mask<df.Mask.shift(-1),1,df.Mask)
# filter where flag == 1
sub_df = df[df.Mask==1].drop(['Mask'],axis=1)
# drop everything useless
df = df.drop(['Mask'],axis=1)
sub_df
输出 :