首页 > 解决方案 > Azure Speech to Text Translations with multiple languages

问题描述

I'm fairly new to Azure's speech sdk so it's quite possible I'm missing something obvious so apologies if that's the case.

I've been working on a project where I want to translate an audio file/stream from one language to another. It works decently when they entire conversation is in one language (all Spanish) but it falls apart when I feed it a real conversations where there's English and Spanish. It tries to recognize the english words AS spanish words (so it'll transcribe something like 'I'm sorry' as mangled spanish).

From what I can tell, you can set multiple target languages (language to translated into) but only one speechRecognitionLanguage. That seems to imply that it can't handle conversations where there's multiple languages (like a phone call with a translator) or if speakers flip between languages. Is there a way to make it work with multiple languages or is that just something Microsoft hasn't quite gotten around to yet?

Here's the code I have right now (it's just a lightly modified version of the example on their github):

// pull in the required packages.
var sdk = require("microsoft-cognitiveservices-speech-sdk");

(function() {
"use strict";

    module.exports = {
    main: function(settings, audioStream) {

        // now create the audio-config pointing to our stream and
        // the speech config specifying the language.
        var audioConfig = sdk.AudioConfig.fromStreamInput(audioStream);
        var translationConfig = sdk.SpeechTranslationConfig.fromSubscription(settings.subscriptionKey, settings.serviceRegion);

        // setting the recognition language.
        translationConfig.speechRecognitionLanguage = settings.language;

        // target language (to be translated to).
        translationConfig.addTargetLanguage("en");

        // create the translation recognizer.
        var recognizer = new sdk.TranslationRecognizer(translationConfig, audioConfig);

        recognizer.recognized = function (s, e) {
            if (e.result.reason === sdk.ResultReason.NoMatch) {
                var noMatchDetail = sdk.NoMatchDetails.fromResult(e.result);
                console.log("\r\nDidn't find a match: " + sdk.NoMatchReason[noMatchDetail.reason]);
            } else {
                var str = "\r\nNext Line: " + e.result.text + "\nTranslations:";

                var language = "en";
                str += " [" + language + "] " + e.result.translations.get(language);
                str += "\r\n";

                console.log(str);
            }
        };

        //two possible states, Error or EndOfStream
        recognizer.canceled = function (s, e) {
            var str = "(cancel) Reason: " + sdk.CancellationReason[e.reason];
            //if it was because of an error
            if (e.reason === sdk.CancellationReason.Error) {
                str += ": " + e.errorDetails;
                console.log(str);
            }
            //We've reached the end of the file, stop the recognizer
            else {
                recognizer.stopContinuousRecognitionAsync(function() {
                console.log("End of file.");

                recognizer.close();
                recognizer = undefined;
                },
                function(err) {
                console.trace("err - " + err);
                recognizer.close();
                recognizer = undefined;
                })
            }
        };


        // start the recognizer and wait for a result.
        recognizer.startContinuousRecognitionAsync(
            function () {
                console.log("Starting speech recognition");
            },
            function (err) {
                console.trace("err - " + err);

                recognizer.close();
                recognizer = undefined;
            }
        );
    }

    }
}());

标签: speech-to-textmicrosoft-cognitiveazure-cognitive-services

解决方案


According to the section Speech translation of the offical document Language and region support for the Speech Services, as below, I think you can use Speech translation instead of Speech-To-text to realize your needs.

Speech translation

The Speech Translation API supports different languages for speech-to-speech and speech-to-text translation. The source language must always be from the Speech-to-Text language table. The available target languages depend on whether the translation target is speech or text. You may translate incoming speech into more than 60 languages. A subset of these languages are available for speech synthesis.

enter image description here

Meanwhile, there is the offical sample code Azure-Samples/cognitive-services-speech-sdk/samples/js/node/translation.js for Speech translation.

I do not speak in Spanish, so I can not help to test an audio in English and Spanish for you.

Hope it helps.


推荐阅读