首页 > 解决方案 > 如何将 AudioRecord 类中识别的语音保存在文件中?

问题描述

我的应用程序可以识别我的声音并将其转换为文本到语音。我想同时以可播放的格式保存已识别的语音。如果我想将此语音文件和文本发布到另一台服务器。

private static final int[] SAMPLE_RATE_CANDIDATES = new int[]{16000, 11025, 22050, 44100};
private static final int CHANNEL = AudioFormat.CHANNEL_IN_MONO;
private static final int ENCODING = AudioFormat.ENCODING_PCM_16BIT;

我试图在文件中写入流。虽然我不确定,但有时它可能会写一些东西,因为我看到文件大小已经增加。但是当我将它发送到服务器端时,该文件无法播放。

public void start() {
    // Stop recording if it is currently ongoing.
    stop();
    // Try to create a new recording session.
    mAudioRecord = createAudioRecord();
    if (mAudioRecord == null) {
        throw new RuntimeException("Cannot instantiate VoiceRecorder");
    }
    // Start recording.
    mAudioRecord.startRecording();
    // Start processing the captured audio.
    mThread = new Thread(new ProcessVoice());
    mThread.start();
}    

/** * 不断处理捕获的音频并通知 {@link #mCallback} 相应的 * 事件。*/ 私有类 ProcessVoice 实现 Runnable {

    @Override
    public void run() {
        while (true) {
            synchronized (mLock) {
                if (Thread.currentThread().isInterrupted()) {
                    break;
                }
                final int size = mAudioRecord.read(mBuffer, 0, mBuffer.length);
                try {
                    os.write(mBuffer, 0, mBuffer.length);
                } catch (IOException e) {
                    Log.e(LOGTAG, "Error saving recording ", e);
                    return;
                }
                final long now = System.currentTimeMillis();
                if (isHearingVoice(mBuffer, size)) {
                    if (mLastVoiceHeardMillis == Long.MAX_VALUE) {
                        mVoiceStartedMillis = now;
                        mCallback.onVoiceStart();
                    }
                    mCallback.onVoice(mBuffer, size);
                    mLastVoiceHeardMillis = now;
                    if (now - mVoiceStartedMillis > MAX_SPEECH_LENGTH_MILLIS) {
                        end();
                    }
                } else if (mLastVoiceHeardMillis != Long.MAX_VALUE) {
                    mCallback.onVoice(mBuffer, size);
                    if (now - mLastVoiceHeardMillis > SPEECH_TIMEOUT_MILLIS) {
                        end();
                        mCallback.onVoiceStart();
                    }
                }
            }
        }
    }

标签: androidtext-to-speechvoice-recognitionvoice-recording

解决方案


您正在输出流中写入原始数据(即 PCM)。要使该文件可播放,您必须将此数据编码为某种可播放格式。最简单和最简单的一种是WAV,它实际上是Header+PCM Raw Data

在输出流中写入数据后,您可以像这样添加wav 标头

int byteRate = sample_rate*1*16/8;    //sample_rate*channel*bits_per_sample/8

        int blockAlign = 1*16/8;     //channel*bits_per_sample/8

        int dataLength = 36+(int)outputWAVFile.length();    //Open outputWavFile using FileOutputStream. The one which contains raw data.

        byte[] header = new byte[44];

        header[0] = 'R';  // RIFF/WAVE header
        header[1] = 'I';
        header[2] = 'F';
        header[3] = 'F';
        header[4] = (byte) (dataLength & 0xff);
        header[5] = (byte) ((dataLength >> 8) & 0xff);
        header[6] = (byte) ((dataLength >> 16) & 0xff);
        header[7] = (byte) ((dataLength >> 24) & 0xff);
        header[8] = 'W';
        header[9] = 'A';
        header[10] = 'V';
        header[11] = 'E';
        header[12] = 'f';  // 'fmt ' chunk
        header[13] = 'm';
        header[14] = 't';
        header[15] = ' ';
        header[16] = 16;  // 4 bytes: size of 'fmt ' chunk
        header[17] = 0;
        header[18] = 0;
        header[19] = 0;
        header[20] = 1;  // format = 1
        header[21] = 0;
        header[22] = (byte) 1;  //channel
        header[23] = 0;
        header[24] = (byte) (sample_rate & 0xff);
        header[25] = (byte) ((sample_rate >> 8) & 0xff);
        header[26] = (byte) ((sample_rate >> 16) & 0xff);
        header[27] = (byte) ((sample_rate >> 24) & 0xff);
        header[28] = (byte) (byteRate & 0xff);
        header[29] = (byte) ((byteRate >> 8) & 0xff);
        header[30] = (byte) ((byteRate >> 16) & 0xff);
        header[31] = (byte) ((byteRate >> 24) & 0xff);
        header[32] = (byte) blockAlign;  // block align
        header[33] = 0;
        header[34] = 16;  // bits per sample
        header[35] = 0;
        header[36] = 'd';
        header[37] = 'a';
        header[38] = 't';
        header[39] = 'a';
        header[40] = (byte) (outputWAVFile.length() & 0xff);
        header[41] = (byte) ((outputWAVFile.length() >> 8) & 0xff);
        header[42] = (byte) ((outputWAVFile.length() >> 16) & 0xff);
        header[43] = (byte) ((outputWAVFile.length() >> 24) & 0xff);


        raf = new RandomAccessFile(outputWAVFile,"rw");
        raf.seek(0);
        raf.write(header);

    } catch (FileNotFoundException e) {
        e.printStackTrace();
        return false;
    } catch (IOException e) {
        e.printStackTrace();
        return false;
    }finally {
        try {
            raf.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

请记住 outputWavFile 应该是.wav格式。


推荐阅读