首页 > 解决方案 > Python中时间序列的向量化子集

问题描述

我正在寻找矢量化以创建一个 numpy 2D 数组,其中每行包含使用滑动窗口提取的 64 天数据,该数据使用具有超过 6000 天数据的熊猫系列。

窗口大小为 64,步幅为 1。

以下是基于英格丽德回答的直接循环和列表连接的解决方案:

# Set up a dataframe with 6000 random samples
df = pd.DataFrame(np.random.rand(6000),columns=['d_ret'])
days_of_data = df['d_ret'].count()

n_D = 64   # Window size

# The dataset will have m = (days_of_data - n_D + 1) rows
m = days_of_data - n_D + 1

# Build the dataset with a loop
t = time.time()                                 # Start timing
X = np.zeros((m,n_D))                           # Initialize np array
for day in range(m):                            # Loop from 0 to (days_of_data - n_D + 1)
    X[day][:] = df['d_ret'][day:day+n_D].values # Copy content of sliding window into array  
elapsed = time.time() - t                       # Stop timing

print("X.shape\t: {}".format(X.shape))
print("Elapsed time\t: {}".format(elapsed))

t = time.time()                                 # Start timing
X1 = [df.loc[ind: ind+n_D-1, 'd_ret'].values for ind, _ in df.iterrows()]
X2 = [lst for lst in X1 if len(lst) == n_D]
X_np = np.array(X2)                             # Get np array as output
elapsed = time.time() - t                       # Stop timing

print("X_np.shape\t: {}".format(X_np.shape))
print("Elapsed time\t: {}".format(elapsed))

输出

X.shape : (5937, 64)
Elapsed time    : 0.37702155113220215
X_np.shape  : (5937, 64)
Elapsed time    : 0.7020401954650879

我如何矢量化它?

示例输入/输出

# Input
Input = pd.Series(range(128))

# Output
array([[  0.,   1.,   2., ...,  61.,  62.,  63.],
   [  1.,   2.,   3., ...,  62.,  63.,  64.],
   [  2.,   3.,   4., ...,  63.,  64.,  65.],
   ...,
   [ 62.,  63.,  64., ..., 123., 124., 125.],
   [ 63.,  64.,  65., ..., 124., 125., 126.],
   [ 64.,  65.,  66., ..., 125., 126., 127.]])

标签: pythonpandasnumpy

解决方案


您可以使用重塑

df.d_ret.values.reshape(-1, 64)

推荐阅读