python - Google Speech-To-Text 会随机跳过部分音频吗？

问题描述

我有荷兰电话，我正在使用 Google STT (long_running_recognize) 转录它们。一切正常，但许多单词无法识别。转录似乎时不时地随机停止几秒钟。无法识别的部分表示为非常长的时间戳。例如，现实中有一个词从大约 17 秒开始需要约 0.5 秒，但时间戳为 11.5 秒 - 17.5 秒，因此无法识别约 5.5 秒的清晰语音。

这是我使用的配置：

CONFIG = speech.types.RecognitionConfig(
    encoding = speech.enums.RecognitionConfig.AudioEncoding.LINEAR16, # optional for WAV
    # model="phone_call", # this doesn't exist for Dutch
    sample_rate_hertz = 8000, # default value
    language_code="nl-NL", # language code
    enable_word_time_offsets=True # return hit timestamps
)

这是mediainfo一个记录的信息（使用）。

General
Complete name                            : 20161130_215643_31651118731.wav
Format                                   : Wave
File size                                : 2.30 MiB
Duration                                 : 2mn 30s
Overall bit rate mode                    : Constant
Overall bit rate                         : 128 Kbps
Writing application                      : Lavf57.25.100

Audio
Format                                   : PCM
Format settings, Endianness              : Little
Format settings, Sign                    : Signed
Codec ID                                 : 1
Duration                                 : 2mn 30s
Bit rate mode                            : Constant
Bit rate                                 : 128 Kbps
Channel(s)                               : 1 channel
Sampling rate                            : 8 000 Hz
Bit depth                                : 16 bits
Stream size                              : 2.30 MiB (100%)

由于隐私原因，我无法共享音频或转录内容，但识别出的单词大多是正确的，时间戳也是正确的。

为什么是这样？谷歌无法理解未转录的部分，尤其是考虑到这种语言模型？我可以做些什么来增加识别单词的数量吗？

标签： pythonpython-3.xaudiospeech-recognitiongoogle-cloud-speech

python - Google Speech-To-Text 会随机跳过部分音频吗？

问题描述

解决方案

推荐阅读