首页 > 解决方案 > 如何将 PCM 字节数组转换为 little-endian 和 mono?

问题描述

我正在尝试将来自在线通信应用程序的音频输入 Vosk 语音识别 API。

音频以字节数组的形式出现,并采用这种音频格式PCM_SIGNED 48000.0 Hz, 16 bit, stereo, 4 bytes/frame, big-endian。为了能够用 Vosk 处理它,它需要是monoand little-endian

这是我目前的尝试:

        byte[] audioData = userAudio.getAudioData(1);
        short[] convertedAudio = new short[audioData.length / 2];
        ByteBuffer buffer = ByteBuffer.allocate(convertedAudio.length * Short.BYTES);
        
        // Convert to mono, I don't think I did it right though
        int j = 0;
        for (int i = 0; i < audioData.length; i += 2)
            convertedAudio[j++] = (short) (audioData[i] << 8 | audioData[i + 1] & 0xFF);

        // Convert to little endian
        buffer.order(ByteOrder.BIG_ENDIAN);
        for (short s : convertedAudio)
            buffer.putShort(s);
        buffer.order(ByteOrder.LITTLE_ENDIAN);
        buffer.rewind();

        for (int i = 0; i < convertedAudio.length; i++)
            convertedAudio[i] = buffer.getShort();

        queue.add(convertedAudio);

标签: javaarraysaudiobytejava-audio

解决方案


我遇到了同样的问题,发现这个stackoverflow帖子将原始 pcm 字节数组转换为音频输入流。

我假设您使用的是Java Discord API (JDA),所以这是我使用 vosk 的“handleUserAudio()”函数的初始代码,以及我在上面提供的链接中的代码:

                // Define audio format that vosk uses
            AudioFormat target = new AudioFormat(
                    16000, 16, 1, true, false);

            try {
                byte[] data = userAudio.getAudioData(1.0f);
                // Create audio stream that uses the target format and the byte array input stream from discord
                AudioInputStream inputStream = AudioSystem.getAudioInputStream(target,
                        new AudioInputStream(
                                new ByteArrayInputStream(data), AudioReceiveHandler.OUTPUT_FORMAT, data.length));

                // This is what was used before
//                InputStream inputStream = new ByteArrayInputStream(data);

                int nbytes;
                byte[] b = new byte[4096];
                while ((nbytes = inputStream.read(b)) >= 0) {
                    if (recognizer.acceptWaveForm(b, nbytes)) {
                        System.out.println(recognizer.getResult());
                    } else {
                        System.out.println(recognizer.getPartialResult());
                    }
                }
//                queue.add(data);
            } catch (Exception e) {
                e.printStackTrace();
            }

到目前为止,这是可行的,但是,它将所有内容都放入识别器的 '.getPartialResult()' 方法中,但至少 vosk 正在理解来自不和谐机器人的音频。


推荐阅读