speech-to-text - Azure Speech to Text Translations with multiple languages
问题描述
I'm fairly new to Azure's speech sdk so it's quite possible I'm missing something obvious so apologies if that's the case.
I've been working on a project where I want to translate an audio file/stream from one language to another. It works decently when they entire conversation is in one language (all Spanish) but it falls apart when I feed it a real conversations where there's English and Spanish. It tries to recognize the english words AS spanish words (so it'll transcribe something like 'I'm sorry' as mangled spanish).
From what I can tell, you can set multiple target languages (language to translated into) but only one speechRecognitionLanguage. That seems to imply that it can't handle conversations where there's multiple languages (like a phone call with a translator) or if speakers flip between languages. Is there a way to make it work with multiple languages or is that just something Microsoft hasn't quite gotten around to yet?
Here's the code I have right now (it's just a lightly modified version of the example on their github):
// pull in the required packages.
var sdk = require("microsoft-cognitiveservices-speech-sdk");
(function() {
"use strict";
module.exports = {
main: function(settings, audioStream) {
// now create the audio-config pointing to our stream and
// the speech config specifying the language.
var audioConfig = sdk.AudioConfig.fromStreamInput(audioStream);
var translationConfig = sdk.SpeechTranslationConfig.fromSubscription(settings.subscriptionKey, settings.serviceRegion);
// setting the recognition language.
translationConfig.speechRecognitionLanguage = settings.language;
// target language (to be translated to).
translationConfig.addTargetLanguage("en");
// create the translation recognizer.
var recognizer = new sdk.TranslationRecognizer(translationConfig, audioConfig);
recognizer.recognized = function (s, e) {
if (e.result.reason === sdk.ResultReason.NoMatch) {
var noMatchDetail = sdk.NoMatchDetails.fromResult(e.result);
console.log("\r\nDidn't find a match: " + sdk.NoMatchReason[noMatchDetail.reason]);
} else {
var str = "\r\nNext Line: " + e.result.text + "\nTranslations:";
var language = "en";
str += " [" + language + "] " + e.result.translations.get(language);
str += "\r\n";
console.log(str);
}
};
//two possible states, Error or EndOfStream
recognizer.canceled = function (s, e) {
var str = "(cancel) Reason: " + sdk.CancellationReason[e.reason];
//if it was because of an error
if (e.reason === sdk.CancellationReason.Error) {
str += ": " + e.errorDetails;
console.log(str);
}
//We've reached the end of the file, stop the recognizer
else {
recognizer.stopContinuousRecognitionAsync(function() {
console.log("End of file.");
recognizer.close();
recognizer = undefined;
},
function(err) {
console.trace("err - " + err);
recognizer.close();
recognizer = undefined;
})
}
};
// start the recognizer and wait for a result.
recognizer.startContinuousRecognitionAsync(
function () {
console.log("Starting speech recognition");
},
function (err) {
console.trace("err - " + err);
recognizer.close();
recognizer = undefined;
}
);
}
}
}());
解决方案
According to the section Speech translation
of the offical document Language and region support for the Speech Services
, as below, I think you can use Speech translation
instead of Speech-To-text
to realize your needs.
Speech translation
The Speech Translation API supports different languages for speech-to-speech and speech-to-text translation. The source language must always be from the Speech-to-Text language table. The available target languages depend on whether the translation target is speech or text. You may translate incoming speech into more than 60 languages. A subset of these languages are available for speech synthesis.
Meanwhile, there is the offical sample code Azure-Samples/cognitive-services-speech-sdk/samples/js/node/translation.js
for Speech translation
.
I do not speak in Spanish, so I can not help to test an audio in English and Spanish for you.
Hope it helps.
推荐阅读
- django - Django 注释和 GROUP BY
- ios - IOS启动时如何区分通知是本地通知还是远程通知
- excel - Excel - 如何在每行中创建不同数量的列?
- ios - Swift 4 数组内置方法“min()”和“max()”是否适用于字符串数组?
- excel - 在 VBA 中执行脚本后退出 Cmd 窗口
- apache - Apache 服务器未在 webmin 上启动
- delphi - CreateProcess , WaitForSingleObject , 在调用应用程序时禁用输入
- php - 一旦表单的值在php的数据库中,如何使表单消失
- python - Python / numpy:删除3D数组的空(零)边框
- dart - VSCode Dart 导入快捷方式