首页 > 解决方案 > 如何使用 numpy 构建 RNN

问题描述

我正在尝试在 python 中使用 Numpy 实现递归神经网络。我正在尝试为分类问题实现多对一 RNN。我对伪代码有点模糊,尤其是在 BPTT 概念上。我对向前传球感到满意(不完全确定我的实施是否正确),但对向后传球感到很困惑,我需要该领域专家的一些建议。

我确实查看了相关帖子:1)在 numpy 中实现 RNN

2) RNN 的输出

3)如何构建RNN

但我觉得我的问题是首先要理解伪代码/概念,这些帖子中的代码是完整的,并且已经达到了比我更深的阶段。

我的实现灵感来自本教程:

WildML RNN 从头开始

我确实按照同一作者的部分教程实现了前馈神经网络,但我真的对他的这种实现感到困惑。Andrew Ng 的 RNN 视频建议使用 3 种不同的权重(激活权重、输入层和输出层),但上面的教程只有两组权重(如果我错了,请纠正我)。

我的代码中的命名遵循 Andrew Ng 的 RNN 伪代码......

我正在将我的输入样本重塑为 3D(batch_size、n_time 步长、n_ 维度)......一旦,我重塑我的样本我正在分别对每个样本进行前向传递......

这是我的代码:

def RNNCell(X, lr, y=None, n_timesteps=None, n_dimensions=None, return_sequence = None, bias = None):



'''Simple function to compute forward and bakward passes for a Many-to-One Recurrent  Neural Network Model.

This function Reshapes X,Y in to 3D array of shape (batch_size, n_timesteps, n_ dimensions) and then performs

recurrent operations on each sample of the data for n_timesteps'''

# If user has specified some target variable
if len(y) != 0:

    # No. of unique values in the target variables will be the dimesions for the output layer
    _,n_unique = np.unique(y, return_counts=True)

else:

    # If there's no target variable given, then dimensions of target variable by default is 2
    n_unique = 2


# Weights of Vectors to multiply with input samples
Wx = np.random.uniform(low = 0.0,

                       high = 0.3,

                      size = (n_dimensions, n_dimensions))

# Weights of  Vectors to multiply with resulting activations
Wy = np.random.uniform(low = 0.0,

                      high = 0.3,

                      size = (n_dimensions, n_timesteps))

# Weights of Vectors to multiple with activations of previous time steps
Wa = np.random.randn(n_dimensions, n_dimensions)


# List to hold activations of each time step
activations = {'a-0' : np.zeros(shape=(n_timesteps-1, n_dimensions),

                               dtype=float)}


# List to hold Yhat at each time step
Yhat = []

try:
    # Reshape X to align with the shape of RNN architecture
    X = np.reshape(X, newshape=(len(X), n_timesteps, n_dimensions))

except:

    return "Sorry can't reshape and array in to your shape"


def Forward_Prop(sample):

    # Outputs at the last time step
    Ot = 0

    # In each time step
    for time_step in range(n_timesteps+1):

        if time_step < n_timesteps:

            # activation G ( Wa.a<t> + X<t>.Wx )
            activations['a-' + str(time_step+1)] = ReLu( np.dot( activations['a-' + str(time_step)], Wa )   
                                            + np.dot(   sample[time_step, :].reshape(1, n_dimensions)    , Wx    ) )

        # IF it's the last time step then use softmax activation function
        elif time_step == n_timesteps:

            # Wy.a<t> and appending that to Yhat list
            Ot = softmax( np.dot( activations['a-' + str(time_step)], Wy ) ) 

    # Return output probabilities
    return Ot


def Backward_Prop(Yhat):

    # List to hold errors for the last layer
    error = []

    for ind in range(len(Yhat)):

        error.append(   y[ind] - Yhat[ind]  )

    error = np.array(error)

    # Calculating Delta for the output layer
    delta_out = error * lr 
    #* relu_derivative(activations['a-' + str(n_timesteps)])



    # Calculating gradient for the output layer
    grad_out = np.dot(delta_out.reshape(len(X), n_timesteps),
                      activations['a-' + str(n_timesteps)])


    # I'm basically stuck at this point

    # Adjusting weights for the output layer
    Wy = Wy - (lr * grad_out.reshape((n_dimesions, n_timesteps)))








for sample in X:

    Yhat.append( Forward_Prop(sample) )


Backward_Prop(Yhat)


return Yhat




  # DUMMY INPUT DATA
   X = np.random.random_integers(low=0, high = 5, size = (10, 10 ));

# DUMMY LABELS
y = np.array([[0],
             [1],
             [1],
             [1],
             [0],
             [0],
             [1],
             [1],
             [0],
             [1]])

我知道我的 BPTT 实施是错误的,但我没有清楚地思考,我需要一些专家的观点来了解我到底在哪里错过了窍门。我不希望对我的代码进行详细调试,我只需要对反向传播的伪代码进行高级概述(假设我的前向道具是正确的)。我认为我的基本问题也可能与我单独对每个样本进行前向传递的方式有关。

自从 3 天以来,我一直被这个问题困扰,无法清晰地思考真的很令人沮丧。如果有人能指出我正确的方向并消除我的困惑,我将不胜感激。提前感谢您的时间!我真的很感激它!

标签: python-3.xnumpyrecurrent-neural-networkbackpropagation

解决方案


推荐阅读