首页 > 解决方案 > Nexmo Voice api,带有 websockets 和用于语音翻译的 azure 认知服务

问题描述

我正在使用 azure 认知服务进行语音翻译,以进行语音翻译。当来电者拨打nexmo号码时,我会在websockets上得到它。然后,我使用 azure 语音翻译将文本翻译为语音并将其写入套接字以响应 nexmo 呼叫刚刚断开连接。这是我在网络套接字上使用的代码:

      var configWait = SpeechConfig.FromSubscription(_appSettings.azurecognitiveservicespeech_subscriptionkey, "centralus");
            using (var audioOutputStream = AudioOutputStream.CreatePullStream())
            using (var output = AudioConfig.FromStreamOutput(audioOutputStream))
            using (var synthesizer1 = new SpeechSynthesizer(configWait, output))
            using (var resultWait = await synthesizer1.SpeakTextAsync("Please Wait while next representative is available."))
            {
                if (resultWait.Reason == ResultReason.SynthesizingAudioCompleted)
                {
                    var ttsAudio = resultWait.AudioData;
                    const int chunkSize = 320;
                    var chunkCount = 1;
                    var offset = 0;

                    var lastFullChunck = ttsAudio.Length < (offset + chunkSize);
                    try
                    {
                        while (!lastFullChunck)
                        {
                            await socket.SendAsync(new ArraySegment<byte>(ttsAudio, offset, chunkSize), result.MessageType, false, CancellationToken.None);
                            offset = chunkSize * chunkCount;
                            lastFullChunck = ttsAudio.Length < (offset + chunkSize);
                            chunkCount++;
                        }

                        var lastMessageSize = ttsAudio.Length - offset;
                        await socket.SendAsync(new ArraySegment<byte>(ttsAudio, offset, lastMessageSize), result.MessageType, true, CancellationToken.None);
                    }
                    catch (Exception ex)
                    {
                    }
                }
            }

标签: websocketazure-cognitive-servicesnexmo

解决方案


看起来您可能会在流程结束时发送一个奇怪大小的音频块。不确定这适合您的 WebSocket 的整个上下文,因为问题中没有共享。这是一些对我有用的代码,用于接收和写回音频:

while (!result.CloseStatus.HasValue)
{
    byte[] audio;
    while(_audioToWrite.TryDequeue(out audio))
    {
        const int bufferSize = 640;
        for(var i = 0; i + bufferSize < audio.Length; i += bufferSize)
        {
            var audioToSend = audio[i..(i + bufferSize)];
            var endOfMessage = audio.Length > (bufferSize + i);
            await webSocket.SendAsync(new ArraySegment<byte>(audioToSend, 0, bufferSize), WebSocketMessageType.Binary, endOfMessage, CancellationToken.None);
        }                        
    }

    result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);

    _inputStream.Write(buffer);
}

这是我在这个主题上写的一篇博客文章的略微修改版本。您可以在GitHub中找到源代码。这不像您那样使用翻译服务,只是直接将语音转换为文本 - 但它的工作方式应该大致相同。您也可以查看博客文章


推荐阅读