python - 尝试使用 VAD(语音活动检测器)检测语音
问题描述
我能够阅读音频,但在将音频传递给 VAD(语音活动检测器)时收到错误消息。我认为错误消息是因为帧以字节为单位,当将其提供给 vad.is_speech(frame, sample_rate) 时,该帧是否应该以字节为单位?下面是代码:
frame_duration_ms=10
duration_in_ms = (frame_duration_ms / 1000) #duration in 10ms
frame_size = int(sample_rate * duration_in_ms) #frame size of 160
frame_bytes = frame_size * 2
def frame_generator(buffer, frame_bytes):
# repeatedly store 320 length array to the frame_stored when the frame_bytes is less than the size of the buffer
while offset+frame_bytes < len(buffer):
frame_stored = buffer[offset : offset+frame_bytes]
offset = offset + frame_bytes
return frame_stored
num_padding_frames = int(padding_duration_ms / frame_duration_ms)
# use deque for the sliding window
ring_buffer = deque(maxlen=num_padding_frames)
# we have two states TRIGGERED and NOTTRIGGERED state
triggered = True #NOTTRIGGERED state
frames = frame_generator(buffer, frame_bytes)
speech_frame = []
for frame in frames:
is_speech = vad.is_speech(frame, sample_rate)
这是错误消息:
16 Speech_frame = [] 17 for frame in frames: ---> 18 is_speech = vad.is_speech(frame, sample_rate) 19 #print(frames) 中的 TypeError Traceback (最近一次调用最后一次)
C:\Program Files\Python38\lib\site-packages\webrtcvad.py in is_speech(self, buf, sample_rate, length) 20 21 def is_speech(self, buf, sample_rate, length=None): ---> 22 length = 长度或 int(len(buf) / 2) 23 if length * 2 > len(buf): 24 raise IndexError(
TypeError:“int”类型的对象没有 len()
解决方案
我已经解决了,你知道vad.is_speech(buf=frame, sample_rate)
的,它需要buf并计算它的长度,但是一个整数值不具备len()
python中的属性。这会引发错误,例如:
num = 1
print(len(num))
改用这个:
data = [1,2,3,4]
print(len(data))
所以这里是对下面代码的更正:
frame_duration_ms=10
duration_in_ms = (frame_duration_ms / 1000) #duration in 10ms
frame_size = int(sample_rate * duration_in_ms) #frame size of 160
frame_bytes = frame_size * 2
values = []
def frame_generator(buffer, frame_bytes):
# repeatedly store 320 length array to the frame_stored when the frame_bytes is less than the size of the buffer
while offset+frame_bytes < len(buffer):
frame_stored = buffer[offset : offset+frame_bytes]
offset = offset + frame_bytes
values.append(frame_stored)
return values
num_padding_frames = int(padding_duration_ms / frame_duration_ms)
# use deque for the sliding window
ring_buffer = deque(maxlen=num_padding_frames)
# we have two states TRIGGERED and NOTTRIGGERED state
triggered = True #NOTTRIGGERED state
frames = frame_generator(buffer, frame_bytes)
frame = []
for frame in frames:
is_speech = vad.is_speech(frame, sample_rate)
推荐阅读
- javascript - 将 js 承诺链转换为同步调用?
- uwp - Gaze Interaction 是否适用于使用其内置眼动仪的 HoloLens 2 上的 UWP 应用
- javascript - 如何使用 jQuery 定位 SVG 图像内的锚元素
- c - 尝试使用 strtok 从文件中分离字符串并存储在结构数组中
- html - 保持列之间的间隙:Bootstrap 3.3.7
- sql - 如果 ID 同时具有值 A 和 B 它应该返回 AB
- oracle-apex - 内联对话框区域中的粘滞按钮
- java - 使用 pdfbox 对同一 PDF 进行多个外部签名
- python - 如何从图像中提取平滑的骨架?
- facebook-ads-api - 使用 Python facebook 营销 API 将具有多个密钥的用户添加到自定义受众