首页 > 解决方案 > tensorflow 或 keras 中语音帧的上下文扩展

问题描述

假设我有一个形状为 [batch_size, T, d] 的张量,其中 T 是语音文件的帧数,d 是 MFCC 的维度。现在我想在numpy中像这个函数一样扩展左右帧的上下文:

def make_context(feature, left, right):
'''
Takes a 2-D numpy feature array, and pads each frame with a specified
    number of frames on either side.
'''
    feature = [feature]
    for i in range(left):
        feature.append(numpy.vstack((feature[-1][0], feature[-1][:-1])))
    feature.reverse()
   for i in range(right):
       feature.append(numpy.vstack((feature[-1][1:], feature[-1][-1])))
   return numpy.hstack(feature)

如何在 tensorflow 或 keras 中实现这个功能?

标签: tensorflowkeras

解决方案


你可以在 tensorflow 中使用tf.map_fnandtf.py_func来实现这个功能。tf.map_fn可用于批量处理每个元素。tf.py_func可以将此功能应用于元素。例如:

import tensorflow as tf
import numpy as np

def make_context(feature, left, right):
    feature = [feature]
    for i in range(left):
        feature.append(np.vstack((feature[-1][0], feature[-1][:-1])))
    feature.reverse()
    for i in range(right):
        feature.append(np.vstack((feature[-1][1:], feature[-1][-1])))
    return np.hstack(feature)

# numpy usage
feature = np.array([[1,2],[3,4],[5,6]])
print(make_context(feature, 2, 3))

# tensorflow usage
feature_tf = tf.placeholder(shape=(None,None,None),dtype=tf.float32)

result = tf.map_fn(lambda element: tf.py_func(lambda feature, left, right: make_context(feature, left, right)
                    ,[element,2,3]
                    ,tf.float32)
                    ,feature_tf,tf.float32)

with tf.Session() as sess:
    print(sess.run(result,feed_dict={feature_tf:np.array([feature,feature])}))

# print 
[[1 2 1 2 1 2 3 4 5 6 5 6]
 [1 2 1 2 3 4 5 6 5 6 5 6]
 [1 2 3 4 5 6 5 6 5 6 5 6]]

[[[1. 2. 1. 2. 1. 2. 3. 4. 5. 6. 5. 6.]
  [1. 2. 1. 2. 3. 4. 5. 6. 5. 6. 5. 6.]
  [1. 2. 3. 4. 5. 6. 5. 6. 5. 6. 5. 6.]]

 [[1. 2. 1. 2. 1. 2. 3. 4. 5. 6. 5. 6.]
  [1. 2. 1. 2. 3. 4. 5. 6. 5. 6. 5. 6.]
  [1. 2. 3. 4. 5. 6. 5. 6. 5. 6. 5. 6.]]]

推荐阅读