python-2.7 - 计算张量流音频识别中的 stft(Short-time Fourier transform)
问题描述
我很难在这里运行这段示例代码来将音频信号转换为 stfts。我正在使用label_wave.py并编辑“运行图”功能。
def run_graph(wav_data, labels, input_layer_name, output_layer_name,
num_top_predictions):
"""Runs the audio data through the graph and prints predictions."""
data = np.fromstring(wav_data, dtype='int16')[-16000:]
# Normalize the data
data = data * 1.0 / (1 << 15) # Range [-1 1]
data = tf.reshape(data, (1, 16000))
# tf.cast(data, tf.float32)
stfts = tf.contrib.signal.stft(data, frame_length=1024, frame_step=256,fft_length=1024)
错误
Traceback (most recent call last):
File "mfcc_label_wav.py", line 149, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "mfcc_label_wav.py", line 121, in main
FLAGS.output_name, FLAGS.how_many_labels)
File "mfcc_label_wav.py", line 115, in label_wav
run_graph(wav_data, labels_list, input_name, output_name, how_many_labels)
File "mfcc_label_wav.py", line 71, in run_graph
fft_length=1024)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/signal/python/ops/spectral_ops.py", line 91, in stft
return spectral_ops.rfft(framed_signals, [fft_length])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/spectral_ops.py", line 130, in _rfft
input_tensor = _ops.convert_to_tensor(input_tensor, _dtypes.float32)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 950, in convert_to_tensor
as_ref=False)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1040, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 883, in _TensorTensorConversionFunction
(dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float64: 'Tensor("stft/mul:0", shape=(1, 59, 1024), dtype=float64)'
解决方案
执行 tf.cast(data, tf.float32) 不起作用。所以我首先将numpy数组从float64转换为float32,然后重新整形数据。
data = np.fromstring(wav_data, dtype='int16')[-16000:]
# Normalize the data
data = data * 1.0 / (1 << 15) # Range [-1 1]
data = np.float32(data)
data = tf.reshape(data, (1, 16000))
# tf.cast(data, tf.float32)
stfts = tf.contrib.signal.stft(data, frame_length=1024,
frame_step=256,fft_length=1024)
推荐阅读
- firebase - FormatException:意外字符(在字符 1 处) - firebase 身份验证
- python - 两个数字(0.85,0.18)之间的 T 检验将 p 值设为 Nan
- c++ - C++:模板类型的提取
- angular - 模板引用变量给出_r0不是函数错误
- firebase - 过滤 Firebase 数据库中的 2 个项目
- docker - 按照 Realms 中的一般概述启用 Docker Bearer Token Realm
- r - 将具有多个维度的数据框转换为栅格图层
- java - 从 BLE 读取多个特征并读取字符串
- forms - 无法使用 pyrevit 表单
- android - 如何修复 RTL 对称警告