google-speech-api - 400 指定 MP3 编码以匹配音频文件
问题描述
我正在尝试使用 google-speech2text api,但是,即使我已将代码设置为通过所有可用的编码器,我仍不断收到“指定 MP3 编码以匹配音频文件”。
这是我要使用的文件
我必须补充一点,如果我在他们的 UI上上传文件,我可以获得输出。所以我认为源文件没有任何问题。
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient.from_service_account_json('gcp_credentials.json')
speech_file = 'chunk7.mp3'
import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
import wave
ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16,
enums.RecognitionConfig.AudioEncoding.FLAC,
enums.RecognitionConfig.AudioEncoding.MULAW,
enums.RecognitionConfig.AudioEncoding.AMR,
enums.RecognitionConfig.AudioEncoding.AMR_WB,
enums.RecognitionConfig.AudioEncoding.OGG_OPUS,
enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]
SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
for rate in SAMPLE_RATE_HERTZ:
config = types.RecognitionConfig(
encoding=enco,
sample_rate_hertz=rate,
language_code='en-US')
# Detects speech in the audio file
response = []
print(response)
try:
response = client.recognize(config, audio)
print(response)
except:
pass
print("-----------------------------------------------------")
print(str(rate) + " " + str(enco))
print("response: ", str(response))
或者,在波斯语中有另一个文件('fa-IR') - 我面临类似的问题。我最初放了奥巴马文件,因为它更容易理解。如果也用第二个文件测试您的答案,我将不胜感激。
解决方案
似乎您设置encoding
的属性等于 API 提供的所有可能属性。我找到:
encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
适用于 mp3 文件。所以试试这个:
from google.cloud import speech_v1
from google.cloud.speech_v1 import enums
import io
speech_file = 'chunk7.mp3'
def sample_recognize(local_file_path):
"""
Transcribe a short audio file using synchronous speech recognition
Args:
local_file_path Path to local audio file, e.g. /path/audio.wav
"""
client = speech_v1.SpeechClient()
# local_file_path = 'resources/brooklyn_bridge.raw'
# The language of the supplied audio
language_code = "en-US"
# Sample rate in Hertz of the audio data sent
sample_rate_hertz = 16000
# If this fails try sample_rate_hertz = [8000, 12000, 16000, 24000, 48000]
# Encoding of audio data sent. This sample sets this explicitly.
# This field is optional for FLAC and WAV audio formats.
encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
config = {
"language_code": language_code,
"sample_rate_hertz": sample_rate_hertz,
"encoding": encoding,
}
with io.open(local_file_path, "rb") as f:
content = f.read()
audio = {"content": content}
response = client.recognize(config, audio)
for result in response.results:
# First alternative is the most probable result
alternative = result.alternatives[0]
print(u"Transcript: {}".format(alternative.transcript))
sample_recognize(speech_file)
上面的代码是从Speech-to-text docs 中稍作修改的示例。如果这不起作用,请尝试更深入地研究编码文档和最佳实践。祝你好运。
推荐阅读
- python - 创建提供类型错误 Python 的 UInt16 类函数的问题
- javascript - OpenWeather API 错误
- c# - 一些值是对控制器 ASP.Net Core 的空 AJAX 调用
- ios - 如何在iOS中比较两个同名文件以检查它们是同一文件的副本还是不同文件的副本
- deep-learning - 发生异常,使用 %tb 查看完整回溯
- python - 记录到每个作业的scrapyd日志文件
- php - PHP TCPDF - Half-width kana is being considered as Full-width
- ios - Background Colour in UIView not showing
- android - starting background service while phone turn on with example code
- python-3.x - Python Jupyter笔记本如何垂直显示矩阵