首页 > 解决方案 > 有没有办法配置 RNN 的输出形状?

问题描述

我正在尝试创建一个 RNN 来猜测钢琴上正在播放的音符,给定钢琴音符的声音文件(WAV 格式)。我目前正在将 WAV 剪辑切割成 10 秒的块 (2D),用零填充较短的部分到 10 秒,因此输入都是正常的。但是,当我将剪辑传递给 RNN 时,它会给出一个少一维 (1D) 的输出(当采用最后一个状态时 - 我应该采用状态系列吗?)。

我创建了一个更简单的 RNN 来分析单个笔记文件 (2D) 并产生一个输出 (1D),这已经成功。但是,当尝试将相同的技术应用于具有多个音符和音符开始/停止的完整剪辑时,它似乎崩溃了,因为我似乎无法更改输出形状。

def weight_variable(shape):
    initer = tf.truncated_normal_initializer(stddev=0.01)
    return tf.get_variable('W', dtype=tf.float32, shape=shape, initializer=initer)

def bias_variable(shape):
    initial = tf.constant(0., shape=shape, dtype=tf.float32)
    return tf.get_variable('b', dtype=tf.float32,initializer=initial)

def RNN(x, weights, biases, timesteps, num_hidden):
    x = tf.unstack(x, timesteps, 1)

    # Define a rnn cell with tensorflow
    lstm_cell = rnn.LSTMCell(num_hidden)
    states_series, current_state = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
    return tf.matmul(current_state[1], weights) + biases 
    # return [tf.matmul(temp,weights) + biases for temp in states_series]
    # does this even make sense

# x is for data, y is for targets, shapes are [index, time, frequency], [index, time, output note (s)] respectively
x_train, x_valid, y_train, y_valid = load_data() # removed test
print("Size of:")
print("- Training-set:\t\t{}".format(y_train.shape[0]))
print("- Validation-set:\t{}".format(y_valid.shape[0]))
# print("- Test-set\t{}".format(len(y_test)))

learning_rate = 0.001 # The optimization initial learning rate
epochs = 1000         # Total number of training epochs
batch_size = 100      # Training batch size
display_freq = 100    # Frequency of displaying the training results
threshold = 0.7       # Threshold for determining a "note"
num_hidden_units = 15 # Number of hidden units of the RNN

# Placeholders for inputs (x) and outputs(y)
x = tf.placeholder(tf.float32, shape=(None, stepCount, num_input))
y = tf.placeholder(tf.float32, shape=(None, stepCount, n_classes)) 

# create weight matrix initialized randomly from N~(0, 0.01)
W = weight_variable(shape=[num_hidden_units, n_classes])

# create bias vector initialized as zero
b = bias_variable(shape=[n_classes])

output_logits = RNN(x, W, b, stepCount, num_hidden_units)
y_pred = tf.nn.softmax(output_logits)

# Define the loss function, optimizer, and accuracy, etc.
# (code removed, irrelevant)

# Creating the op for initializing all variables
init = tf.global_variables_initializer()

sess = tf.InteractiveSession()
sess.run(init)
global_step = 0
# Number of training iterations in each epoch
num_tr_iter = int(y_train.shape[0] / batch_size)
for epoch in range(epochs):
    print('Training epoch: {}'.format(epoch + 1))
    x_train, y_train = randomize(x_train, y_train)
    for iteration in range(num_tr_iter):
        global_step += 1
        start = iteration * batch_size
        end = (iteration + 1) * batch_size
        x_batch, y_batch = get_next_batch(x_train, y_train, start, end)
        # Run optimization op (backprop)
        feed_dict_batch = {x: x_batch, y: y_batch}
        sess.run(optimizer, feed_dict=feed_dict_batch)

        if iteration % display_freq == 0:
            # Calculate and display the batch loss and accuracy
            loss_batch, acc_batch = sess.run([loss, accuracy],
                                             feed_dict=feed_dict_batch)

            print("iter {0:3d}:\t Loss={1:.2f},\tTraining Accuracy={2:.01%}".
                  format(iteration, loss_batch, acc_batch))
            testLoss.append(loss_batch)
            testAcc.append(acc_batch)

    # Run validation after every epoch

    feed_dict_valid = {x: x_valid[:1000].reshape((-1, stepCount, num_input)), y: y_valid[:1000]}
    loss_valid, acc_valid = sess.run([loss, accuracy], feed_dict=feed_dict_valid)
    print('---------------------------------------------------------')
    print("Epoch: {0}, validation loss: {1:.2f}, validation accuracy: {2:.01%}".
          format(epoch + 1, loss_valid, acc_valid))
    print('---------------------------------------------------------')
    validLoss.append(loss_valid)
    validAcc.append(acc_batch)

目前,这是输出一维预测数组,这在我的场景中确实没有意义,但我不确定如何更改它(它应该输出每个时间步的预测 - 即预测每个时间步播放的音符这一刻)。

标签: pythontensorflowrecurrent-neural-network

解决方案


推荐阅读