tensorflow - tensorflow 或 keras 中语音帧的上下文扩展
问题描述
假设我有一个形状为 [batch_size, T, d] 的张量,其中 T 是语音文件的帧数,d 是 MFCC 的维度。现在我想在numpy中像这个函数一样扩展左右帧的上下文:
def make_context(feature, left, right):
'''
Takes a 2-D numpy feature array, and pads each frame with a specified
number of frames on either side.
'''
feature = [feature]
for i in range(left):
feature.append(numpy.vstack((feature[-1][0], feature[-1][:-1])))
feature.reverse()
for i in range(right):
feature.append(numpy.vstack((feature[-1][1:], feature[-1][-1])))
return numpy.hstack(feature)
如何在 tensorflow 或 keras 中实现这个功能?
解决方案
你可以在 tensorflow 中使用tf.map_fn
andtf.py_func
来实现这个功能。tf.map_fn
可用于批量处理每个元素。tf.py_func
可以将此功能应用于元素。例如:
import tensorflow as tf
import numpy as np
def make_context(feature, left, right):
feature = [feature]
for i in range(left):
feature.append(np.vstack((feature[-1][0], feature[-1][:-1])))
feature.reverse()
for i in range(right):
feature.append(np.vstack((feature[-1][1:], feature[-1][-1])))
return np.hstack(feature)
# numpy usage
feature = np.array([[1,2],[3,4],[5,6]])
print(make_context(feature, 2, 3))
# tensorflow usage
feature_tf = tf.placeholder(shape=(None,None,None),dtype=tf.float32)
result = tf.map_fn(lambda element: tf.py_func(lambda feature, left, right: make_context(feature, left, right)
,[element,2,3]
,tf.float32)
,feature_tf,tf.float32)
with tf.Session() as sess:
print(sess.run(result,feed_dict={feature_tf:np.array([feature,feature])}))
# print
[[1 2 1 2 1 2 3 4 5 6 5 6]
[1 2 1 2 3 4 5 6 5 6 5 6]
[1 2 3 4 5 6 5 6 5 6 5 6]]
[[[1. 2. 1. 2. 1. 2. 3. 4. 5. 6. 5. 6.]
[1. 2. 1. 2. 3. 4. 5. 6. 5. 6. 5. 6.]
[1. 2. 3. 4. 5. 6. 5. 6. 5. 6. 5. 6.]]
[[1. 2. 1. 2. 1. 2. 3. 4. 5. 6. 5. 6.]
[1. 2. 1. 2. 3. 4. 5. 6. 5. 6. 5. 6.]
[1. 2. 3. 4. 5. 6. 5. 6. 5. 6. 5. 6.]]]