python - 由于无声时刻，在 SpeechRecognition 中读取整个音频的问题

问题描述

我在使用 Google 识别器 API 使用 SpeechRecognition 转录整个音频时遇到问题。即使我的音频被正确读取，也只是检测到并转录了它的第一句话。那是因为我的音频文件中有很多“静默秒”，我猜算法正在检测其中的第一个作为我的音频的结尾并中断转录。

为了解决这个问题，我尝试使用energy_threshold和pause_threshold参数，它们似乎没有任何区别（我已经检查了许多不同的值）。

有谁知道如何正确调整 SpeechRecognition 等待的时间段（而不是视为音频的结束）？

r = sr.Recognizer()
gravacao = sr.AudioFile('my_audio.wav')
    
with gravacao as source:
    r.pause_threshold = 10 #Represents the minimum length of silence (in seconds) that will register as the end of a phrase.
    r.energy_threshold = 40 #Represents the energy level threshold for sounds. Values below this threshold are considered silence. Can be changed.
    r.dynamic_energy_threshold = True 

    audio = r.record(source)    
    
lang = "pt-BR"

try:
    pre_frase = r.recognize_google(audio, language = lang)
    print(pre_frase)

except Exception as exp:
    print("Error: {}".format(exp))

标签： pythonaudiospeech-recognitionwavtranscription

python - 由于无声时刻，在 SpeechRecognition 中读取整个音频的问题

问题描述

解决方案

推荐阅读