首页 > 解决方案 > 如何将实时音频 URL 传递给 Google Speech to Text API

问题描述

我有一个直播录音的网址,我正在尝试使用 Google Speech to Text API 进行转录。我正在使用来自 Cloud Speech to Text API 的示例代码。但是,问题是当我传递实时 url 时,我没有收到任何输出。以下是我的代码的相关部分。任何帮助将不胜感激!

from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
import io
import os
import time
import requests
import numpy as np
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
from urllib.request import urlopen
from datetime import datetime
from datetime import timedelta
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]= "app_creds.json"

def get_stream():

    stream = urlopen('streamurl')

    duration = 60
    begin = datetime.now()
    duration = timedelta(seconds=duration)

    while datetime.now() - begin < duration:

        data = stream.read(8000)

        return data

def transcribe_streaming():
    """Streams transcription of the given audio file."""
    client = speech.SpeechClient()

    content = get_stream()

    # In practice, stream should be a generator yielding chunks of audio data.
    stream = [content]
    requests = (types.StreamingRecognizeRequest(audio_content=chunk)
                for chunk in stream)

    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code='en-US')

    streaming_config = types.StreamingRecognitionConfig(config=config)

    # streaming_recognize returns a generator.
    responses = client.streaming_recognize(streaming_config, requests)

    for response in responses:
        # Once the transcription has settled, the first result will contain the
        # is_final result. The other results will be for subsequent portions of
        # the audio.
        for result in response.results:
            print('Finished: {}'.format(result.is_final))
            print('Stability: {}'.format(result.stability))
            alternatives = result.alternatives
            # The alternatives are ordered from most likely to least.
            for alternative in alternatives:
                print('Confidence: {}'.format(alternative.confidence))
                print(u'Transcript: {}'.format(alternative.transcript))


标签: pythonspeech-recognitionspeech-to-textgoogle-speech-api

解决方案


向 Google Speech 服务发送音频时,请确保服务对象设置与音频编码匹配。在您的特定情况下

config = types.RecognitionConfig(
    encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code='en-US')

对应单通道、16KHz、线性16位PCM编码。如果您需要以不同格式转录音频,请参阅其他支持的编码列表。


推荐阅读