python-3.x - 如何使用 numpy 构建 RNN
问题描述
我正在尝试在 python 中使用 Numpy 实现递归神经网络。我正在尝试为分类问题实现多对一 RNN。我对伪代码有点模糊,尤其是在 BPTT 概念上。我对向前传球感到满意(不完全确定我的实施是否正确),但对向后传球感到很困惑,我需要该领域专家的一些建议。
我确实查看了相关帖子:1)在 numpy 中实现 RNN
2) RNN 的输出
3)如何构建RNN
但我觉得我的问题是首先要理解伪代码/概念,这些帖子中的代码是完整的,并且已经达到了比我更深的阶段。
我的实现灵感来自本教程:
我确实按照同一作者的部分教程实现了前馈神经网络,但我真的对他的这种实现感到困惑。Andrew Ng 的 RNN 视频建议使用 3 种不同的权重(激活权重、输入层和输出层),但上面的教程只有两组权重(如果我错了,请纠正我)。
我的代码中的命名遵循 Andrew Ng 的 RNN 伪代码......
我正在将我的输入样本重塑为 3D(batch_size、n_time 步长、n_ 维度)......一旦,我重塑我的样本我正在分别对每个样本进行前向传递......
这是我的代码:
def RNNCell(X, lr, y=None, n_timesteps=None, n_dimensions=None, return_sequence = None, bias = None):
'''Simple function to compute forward and bakward passes for a Many-to-One Recurrent Neural Network Model.
This function Reshapes X,Y in to 3D array of shape (batch_size, n_timesteps, n_ dimensions) and then performs
recurrent operations on each sample of the data for n_timesteps'''
# If user has specified some target variable
if len(y) != 0:
# No. of unique values in the target variables will be the dimesions for the output layer
_,n_unique = np.unique(y, return_counts=True)
else:
# If there's no target variable given, then dimensions of target variable by default is 2
n_unique = 2
# Weights of Vectors to multiply with input samples
Wx = np.random.uniform(low = 0.0,
high = 0.3,
size = (n_dimensions, n_dimensions))
# Weights of Vectors to multiply with resulting activations
Wy = np.random.uniform(low = 0.0,
high = 0.3,
size = (n_dimensions, n_timesteps))
# Weights of Vectors to multiple with activations of previous time steps
Wa = np.random.randn(n_dimensions, n_dimensions)
# List to hold activations of each time step
activations = {'a-0' : np.zeros(shape=(n_timesteps-1, n_dimensions),
dtype=float)}
# List to hold Yhat at each time step
Yhat = []
try:
# Reshape X to align with the shape of RNN architecture
X = np.reshape(X, newshape=(len(X), n_timesteps, n_dimensions))
except:
return "Sorry can't reshape and array in to your shape"
def Forward_Prop(sample):
# Outputs at the last time step
Ot = 0
# In each time step
for time_step in range(n_timesteps+1):
if time_step < n_timesteps:
# activation G ( Wa.a<t> + X<t>.Wx )
activations['a-' + str(time_step+1)] = ReLu( np.dot( activations['a-' + str(time_step)], Wa )
+ np.dot( sample[time_step, :].reshape(1, n_dimensions) , Wx ) )
# IF it's the last time step then use softmax activation function
elif time_step == n_timesteps:
# Wy.a<t> and appending that to Yhat list
Ot = softmax( np.dot( activations['a-' + str(time_step)], Wy ) )
# Return output probabilities
return Ot
def Backward_Prop(Yhat):
# List to hold errors for the last layer
error = []
for ind in range(len(Yhat)):
error.append( y[ind] - Yhat[ind] )
error = np.array(error)
# Calculating Delta for the output layer
delta_out = error * lr
#* relu_derivative(activations['a-' + str(n_timesteps)])
# Calculating gradient for the output layer
grad_out = np.dot(delta_out.reshape(len(X), n_timesteps),
activations['a-' + str(n_timesteps)])
# I'm basically stuck at this point
# Adjusting weights for the output layer
Wy = Wy - (lr * grad_out.reshape((n_dimesions, n_timesteps)))
for sample in X:
Yhat.append( Forward_Prop(sample) )
Backward_Prop(Yhat)
return Yhat
# DUMMY INPUT DATA
X = np.random.random_integers(low=0, high = 5, size = (10, 10 ));
# DUMMY LABELS
y = np.array([[0],
[1],
[1],
[1],
[0],
[0],
[1],
[1],
[0],
[1]])
我知道我的 BPTT 实施是错误的,但我没有清楚地思考,我需要一些专家的观点来了解我到底在哪里错过了窍门。我不希望对我的代码进行详细调试,我只需要对反向传播的伪代码进行高级概述(假设我的前向道具是正确的)。我认为我的基本问题也可能与我单独对每个样本进行前向传递的方式有关。
自从 3 天以来,我一直被这个问题困扰,无法清晰地思考真的很令人沮丧。如果有人能指出我正确的方向并消除我的困惑,我将不胜感激。提前感谢您的时间!我真的很感激它!
解决方案
推荐阅读
- java - 修改 JTree 节点外观
- caching - Spring Data Redis 作为 Spring MVC 的缓存管理器,spring security 身份验证失败
- scala - 如何在 Scala 单元测试中创建临时目录
- r - 数据框列到另外两列的矩阵
- express - 如何在 Express.js 应用程序中将业务逻辑与控制器分离
- excel - 存在内容时无法修复长度问题
- html - 如何将 2 个(或更多)“visio 网站”(*.htm)与子页面合并?
- sql-server-2014 - SQL Server:将多条记录合并到一个字段中
- shopify - 显示一组包含具有特定标签的产品的集合
- c# - NSubstitute 模拟没有输出/引用参数的 void 方法