python - 有没有办法配置 RNN 的输出形状?
问题描述
我正在尝试创建一个 RNN 来猜测钢琴上正在播放的音符,给定钢琴音符的声音文件(WAV 格式)。我目前正在将 WAV 剪辑切割成 10 秒的块 (2D),用零填充较短的部分到 10 秒,因此输入都是正常的。但是,当我将剪辑传递给 RNN 时,它会给出一个少一维 (1D) 的输出(当采用最后一个状态时 - 我应该采用状态系列吗?)。
我创建了一个更简单的 RNN 来分析单个笔记文件 (2D) 并产生一个输出 (1D),这已经成功。但是,当尝试将相同的技术应用于具有多个音符和音符开始/停止的完整剪辑时,它似乎崩溃了,因为我似乎无法更改输出形状。
def weight_variable(shape):
initer = tf.truncated_normal_initializer(stddev=0.01)
return tf.get_variable('W', dtype=tf.float32, shape=shape, initializer=initer)
def bias_variable(shape):
initial = tf.constant(0., shape=shape, dtype=tf.float32)
return tf.get_variable('b', dtype=tf.float32,initializer=initial)
def RNN(x, weights, biases, timesteps, num_hidden):
x = tf.unstack(x, timesteps, 1)
# Define a rnn cell with tensorflow
lstm_cell = rnn.LSTMCell(num_hidden)
states_series, current_state = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
return tf.matmul(current_state[1], weights) + biases
# return [tf.matmul(temp,weights) + biases for temp in states_series]
# does this even make sense
# x is for data, y is for targets, shapes are [index, time, frequency], [index, time, output note (s)] respectively
x_train, x_valid, y_train, y_valid = load_data() # removed test
print("Size of:")
print("- Training-set:\t\t{}".format(y_train.shape[0]))
print("- Validation-set:\t{}".format(y_valid.shape[0]))
# print("- Test-set\t{}".format(len(y_test)))
learning_rate = 0.001 # The optimization initial learning rate
epochs = 1000 # Total number of training epochs
batch_size = 100 # Training batch size
display_freq = 100 # Frequency of displaying the training results
threshold = 0.7 # Threshold for determining a "note"
num_hidden_units = 15 # Number of hidden units of the RNN
# Placeholders for inputs (x) and outputs(y)
x = tf.placeholder(tf.float32, shape=(None, stepCount, num_input))
y = tf.placeholder(tf.float32, shape=(None, stepCount, n_classes))
# create weight matrix initialized randomly from N~(0, 0.01)
W = weight_variable(shape=[num_hidden_units, n_classes])
# create bias vector initialized as zero
b = bias_variable(shape=[n_classes])
output_logits = RNN(x, W, b, stepCount, num_hidden_units)
y_pred = tf.nn.softmax(output_logits)
# Define the loss function, optimizer, and accuracy, etc.
# (code removed, irrelevant)
# Creating the op for initializing all variables
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)
global_step = 0
# Number of training iterations in each epoch
num_tr_iter = int(y_train.shape[0] / batch_size)
for epoch in range(epochs):
print('Training epoch: {}'.format(epoch + 1))
x_train, y_train = randomize(x_train, y_train)
for iteration in range(num_tr_iter):
global_step += 1
start = iteration * batch_size
end = (iteration + 1) * batch_size
x_batch, y_batch = get_next_batch(x_train, y_train, start, end)
# Run optimization op (backprop)
feed_dict_batch = {x: x_batch, y: y_batch}
sess.run(optimizer, feed_dict=feed_dict_batch)
if iteration % display_freq == 0:
# Calculate and display the batch loss and accuracy
loss_batch, acc_batch = sess.run([loss, accuracy],
feed_dict=feed_dict_batch)
print("iter {0:3d}:\t Loss={1:.2f},\tTraining Accuracy={2:.01%}".
format(iteration, loss_batch, acc_batch))
testLoss.append(loss_batch)
testAcc.append(acc_batch)
# Run validation after every epoch
feed_dict_valid = {x: x_valid[:1000].reshape((-1, stepCount, num_input)), y: y_valid[:1000]}
loss_valid, acc_valid = sess.run([loss, accuracy], feed_dict=feed_dict_valid)
print('---------------------------------------------------------')
print("Epoch: {0}, validation loss: {1:.2f}, validation accuracy: {2:.01%}".
format(epoch + 1, loss_valid, acc_valid))
print('---------------------------------------------------------')
validLoss.append(loss_valid)
validAcc.append(acc_batch)
目前,这是输出一维预测数组,这在我的场景中确实没有意义,但我不确定如何更改它(它应该输出每个时间步的预测 - 即预测每个时间步播放的音符这一刻)。
解决方案
推荐阅读
- bash - 多次读取输入
- ada - RISC-V FE310-G002 是否支持 Ada 任务?
- php - 为什么 PHP 有时无法从标题标签中提取标题?
- java - 静态方法的问题:“无法从静态上下文中引用”
- python - 如何训练模型,其中类数等于用于预测的特定图像中的实例数
- javascript - TypeError: Cannot read property 'then' of undefined --got this error while using then
- docker - 新图像显示 2 周 4 个月前在 docker 内创建?
- python - Twitter Coding,但解析时意外EOF,需要helo
- c++ - if(double type == char 类型)
- sql-server - 如何在 SQL 语句中获取 XML 值?