首页 > 解决方案 > 如何使用 Amazon Polly 在 Java 中启用神经文本转语音 (NTTS)

问题描述

我正在尝试使用 Amazon Polly 使用 Java API 将文本转换为语音。正如亚马逊所描述的,有几种支持神经的美国英语语音。https://docs.aws.amazon.com/polly/latest/dg/voicelist.html

我在Java应用程序中运行的代码如下:

package com.amazonaws.demos.polly;

import java.io.IOException;
import java.io.InputStream;

import com.amazonaws.ClientConfiguration;
import com.amazonaws.auth.DefaultAWSCredentialsProviderChain;
import com.amazonaws.regions.Region;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.polly.AmazonPollyClient;
import com.amazonaws.services.polly.model.DescribeVoicesRequest;
import com.amazonaws.services.polly.model.DescribeVoicesResult;
import com.amazonaws.services.polly.model.OutputFormat;
import com.amazonaws.services.polly.model.SynthesizeSpeechRequest;
import com.amazonaws.services.polly.model.SynthesizeSpeechResult;
import com.amazonaws.services.polly.model.Voice;

import javazoom.jl.player.advanced.AdvancedPlayer;
import javazoom.jl.player.advanced.PlaybackEvent;
import javazoom.jl.player.advanced.PlaybackListener;

public class PollyDemo {

    private final AmazonPollyClient polly;
    private final Voice voice;
    private static final String JOANNA="Joanna"; 
    private static final String KENDRA="Kendra"; 
    private static final String MATTHEW="Matthew"; 
    private static final String SAMPLE = "Congratulations. You have successfully built this working demo of Amazon Polly in Java. Have fun building voice enabled apps with Amazon Polly (that's me!), and always look at the AWS website for tips and tricks on using Amazon Polly and other great services from AWS";

    public PollyDemo(Region region) {
        // create an Amazon Polly client in a specific region
        polly = new AmazonPollyClient(new DefaultAWSCredentialsProviderChain(), 
        new ClientConfiguration());
        polly.setRegion(region);

        // Create describe voices request.
        DescribeVoicesRequest describeVoicesRequest = new DescribeVoicesRequest();

        // Synchronously ask Amazon Polly to describe available TTS voices.
        DescribeVoicesResult describeVoicesResult = polly.describeVoices(describeVoicesRequest);
        //voice = describeVoicesResult.getVoices().get(0);
        voice = describeVoicesResult.getVoices().stream().filter(p -> p.getName().equals(MATTHEW)).findFirst().get();
    }

    public InputStream synthesize(String text, OutputFormat format) throws IOException {
        SynthesizeSpeechRequest synthReq = 
        new SynthesizeSpeechRequest().withText(text).withVoiceId(voice.getId())
                .withOutputFormat(format);
        SynthesizeSpeechResult synthRes = polly.synthesizeSpeech(synthReq);

        return synthRes.getAudioStream();
    }

    public static void main(String args[]) throws Exception {
        //create the test class
        PollyDemo helloWorld = new PollyDemo(Region.getRegion(Regions.US_WEST_1));
        //get the audio stream
        InputStream speechStream = helloWorld.synthesize(SAMPLE, OutputFormat.Mp3);

        //create an MP3 player
        AdvancedPlayer player = new AdvancedPlayer(speechStream,
                javazoom.jl.player.FactoryRegistry.systemRegistry().createAudioDevice());

        player.setPlayBackListener(new PlaybackListener() {
            @Override
            public void playbackStarted(PlaybackEvent evt) {
                System.out.println("Playback started");
                System.out.println(SAMPLE);
            }

            @Override
            public void playbackFinished(PlaybackEvent evt) {
                System.out.println("Playback finished");
            }
        });


        // play it!
        player.play();

    }
} 

默认情况下,它采用马修的声音标准。请建议需要更改哪些内容,以使语音对马修的声音具有神经性。

谢谢

标签: javaamazon-web-servicestext-to-speechamazon-polly

解决方案


感谢@ASR 的反馈。

我能够按照您的建议找到引擎参数。

我必须解决的方法是:

  1. 在 pom.xml 中将 aws-java-sdk-polly 版本从 1.11.77(如他们的文档中所包含的)更新到最新的 1.11.762 并构建 Maven 项目。这带来了 SynthesizeSpeechRequest 类的最新类定义。在 1.11.77 中,我无法在其定义中看到withEngine函数。
<dependency>
 <groupId>com.amazonaws</groupId>
 <artifactId>aws-java-sdk-polly</artifactId>
 <version>1.11.762</version>
</dependency>
  1. 更新了 withEngine("neural") 如下:
SynthesizeSpeechRequest synthReq = 
        new SynthesizeSpeechRequest().withText(text).withVoiceId(voice.getId())
                .withOutputFormat(format).withEngine("neural");
  1. 如https://docs.aws.amazon.com/polly/latest/dg/NTTS-main.html中所定义,神经语音仅在特定地区可用。所以我不得不选择如下:
PollyDemo helloWorld = new PollyDemo(Region.getRegion(Regions.US_WEST_2));

在这个神经声音完美运行之后。


推荐阅读