首页 > 解决方案 > 如何可视化和选择参数谱图张量流

问题描述

我有 3 分钟持续时间的音频,我有波形形状:(2880000,) 那么 frame_length 和 frame_step 的值应该是多少

spectrogram = tf.signal.stft(waveform, frame_length=?, frame_step=?)

spectrogram = tf.abs(spectrogram)

在选择 frame_length=255 & frame_step=128 我得到频谱图形状为 22499,129 并且当我尝试可视化

def plot_spectrogram(spectrogram, ax):
  # Convert to frequencies to log scale and transpose so that the time is
  # represented in the x-axis (columns).
  log_spec = np.log(spectrogram.T)
  height = log_spec.shape[0]
  X = np.arange(2880000, step=height + 1)
  Y = range(height)
  ax.pcolormesh(X, Y, log_spec)


fig, axes = plt.subplots(2, figsize=(12, 8))
timescale = np.arange(audio.shape[0])
axes[0].plot(timescale, audio.numpy())
axes[0].set_title('Waveform')
axes[0].set_xlim([0, 2880000])
plot_spectrogram(spectrogram.numpy(), axes[1])
axes[1].set_title('Spectrogram')
plt.show()

我收到这个错误 Dimensions of C (129, 22499) are incompatible with X (22154) and/or Y (129); see help(pcolormesh)

所以问题是如何选择参数然后可视化

标签: pythontensorflowmachine-learningdata-sciencetensorflow2.0

解决方案


推荐阅读