首页 > 解决方案 > 400 指定 MP3 编码以匹配音频文件

问题描述

我正在尝试使用 google-speech2text api,但是,即使我已将代码设置为通过所有可用的编码器,我仍不断收到“指定 MP3 编码以匹配音频文件”。

是我要使用的文件

我必须补充一点,如果我在他们的 UI上上传文件,我可以获得输出。所以我认为源文件没有任何问题。

from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient.from_service_account_json('gcp_credentials.json')

speech_file = 'chunk7.mp3'

import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types


with io.open(speech_file, 'rb') as audio_file:
    content = audio_file.read()
    audio = types.RecognitionAudio(content=content)

import wave

ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16, 
            enums.RecognitionConfig.AudioEncoding.FLAC,
            enums.RecognitionConfig.AudioEncoding.MULAW,
            enums.RecognitionConfig.AudioEncoding.AMR,
            enums.RecognitionConfig.AudioEncoding.AMR_WB,
            enums.RecognitionConfig.AudioEncoding.OGG_OPUS, 
            enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]

SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
    for rate in SAMPLE_RATE_HERTZ:
        config = types.RecognitionConfig(
            encoding=enco,
            sample_rate_hertz=rate,
            language_code='en-US')

        # Detects speech in the audio file
        response = []

        print(response)
        try:
            response = client.recognize(config, audio)
            print(response)
        except:
            pass
        print("-----------------------------------------------------")
        print(str(rate) + "   " + str(enco))
        print("response: ", str(response))

或者,在波斯语中有另一个文件('fa-IR') - 我面临类似的问题。我最初放了奥巴马文件,因为它更容易理解。如果也用第二个文件测试您的答案,我将不胜感激。

标签: google-speech-apigoogle-cloud-speech

解决方案


似乎您设置encoding的属性等于 API 提供的所有可能属性。我找到:

encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED

适用于 mp3 文件。所以试试这个:

from google.cloud import speech_v1
from google.cloud.speech_v1 import enums
import io
speech_file = 'chunk7.mp3'


def sample_recognize(local_file_path):
    """
    Transcribe a short audio file using synchronous speech recognition

    Args:
      local_file_path Path to local audio file, e.g. /path/audio.wav
    """

    client = speech_v1.SpeechClient()

    # local_file_path = 'resources/brooklyn_bridge.raw'

    # The language of the supplied audio
    language_code = "en-US"

    # Sample rate in Hertz of the audio data sent
    sample_rate_hertz = 16000   
    # If this fails try sample_rate_hertz = [8000, 12000, 16000, 24000, 48000]


    # Encoding of audio data sent. This sample sets this explicitly.
    # This field is optional for FLAC and WAV audio formats.
    encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
    config = {
        "language_code": language_code,
        "sample_rate_hertz": sample_rate_hertz,
        "encoding": encoding,
    }
    with io.open(local_file_path, "rb") as f:
        content = f.read()
    audio = {"content": content}

    response = client.recognize(config, audio)
    for result in response.results:
        # First alternative is the most probable result
        alternative = result.alternatives[0]
        print(u"Transcript: {}".format(alternative.transcript))

sample_recognize(speech_file)

上面的代码是从Speech-to-text docs 中稍作修改的示例。如果这不起作用,请尝试更深入地研究编码文档和最佳实践。祝你好运。


推荐阅读