首页 > 解决方案 > FFmpeg Opus 断断续续的声音 更新说明

问题描述

我正在使用 FFmpeg 并尝试使用内置的 FFmpeg“opus”编解码器对 Opus 的原始 PCM 声音进行编码和解码。我的输入样本是原始 PCM 8000 Hz 16 位单声道,格式为 AV_SAMPLE_FMT_S16。由于 Opus 只需要样本格式 AV_SAMPLE_FMT_FLTP 和 48000 Hz 的采样率,所以我在对样本进行编码之前重新采样它们。

我有两个ResamplerAudio类的实例,它们负责对音频样本进行重采样,并且有一个成员SwrContext,我使用第一个实例ResamplerAudio在编码之前对原始 PCM 输入音频进行重采样,第二个实例用于对解码后的音频进行重采样以获取其格式和采样率与输入原始音频的源值相同。

ResamplerAudio 类有一个函数来初始化它的 SwrContext 成员,如下所示:

void ResamplerAudio::init(AVCodecContext *codecContext, int inSampleRate, int outSampleRate, AVSampleFormat inSampleFmt, AVSampleFormat outSampleFmt)
{
    swrContext = swr_alloc();
    if (!swrContext)
    {
        LOGE(TAG, "[init] Couldn't allocate swr context");
        return;
    }

    av_opt_set_int(swrContext, "in_channel_layout", (int64_t) codecContext->channel_layout, 0);
    av_opt_set_int(swrContext, "out_channel_layout", (int64_t) codecContext->channel_layout,  0);

    av_opt_set_int(swrContext, "in_channel_count", codecContext->channels, 0);
    av_opt_set_int(swrContext, "out_channel_count", codecContext->channels, 0);

    av_opt_set_int(swrContext, "in_sample_rate", inSampleRate, 0);
    av_opt_set_int(swrContext, "out_sample_rate", outSampleRate, 0);

    av_opt_set_sample_fmt(swrContext, "in_sample_fmt", inSampleFmt, 0);
    av_opt_set_sample_fmt(swrContext, "out_sample_fmt", outSampleFmt,  0);

    int ret = swr_init(swrContext);
    if (ret < 0)
    {
        LOGE(TAG, "[init] swr_init error: %s", av_err2str(ret));
        return;
    }

    LOGD(TAG, "[init] success codecContext->channel_layout: %d; inSampleRate: %d; outSampleRate: %d; inSampleFmt: %d; outSampleFmt: %d", (int) codecContext->channel_layout, inSampleRate, outSampleRate, inSampleFmt, outSampleFmt);
}

我使用以下参数调用ResamplerAudio::init第一个实例的函数ResamplerAudio(此实例在编码之前重新采样原始 PCM 输入音频,我调用了它resamplerEncoder):

resamplerEncoder->init(contextEncoder, 8000, 48000, AV_SAMPLE_FMT_S16, AV_SAMPLE_FMT_FLTP);

第二个实例ResamplerAudio(这个实例在从 Opus 解码音频后重新采样,我称之为它resamplerDecoder)我使用以下参数进行初始化:

resamplerDecoder->init(contextDecoder, 48000, 8000, AV_SAMPLE_FMT_FLTP, AV_SAMPLE_FMT_S16);

重新采样的功能ResamplerAudio如下所示:

std::vector<uint8_t> ResamplerAudio::convert(uint8_t **inData, int inSamplesCount, int outChannels, int outFormat)
{
    std::vector<uint8_t> result;
    uint8_t *dstData = NULL;
    const int dstNbSamples = swr_get_out_samples(swrContext, inSamplesCount);
    av_samples_alloc(&dstData, NULL, outChannels, dstNbSamples, AVSampleFormat(outFormat), 1);
    int resampledSize = swr_convert(swrContext, &dstData, dstNbSamples, (const uint8_t **)inData, inSamplesCount);
    int dstBufSize = av_samples_get_buffer_size(NULL, outChannels, resampledSize, AVSampleFormat(outFormat), 1);

    if (dstBufSize <= 0) return result;

    std::copy(&dstData[0], &dstData[dstBufSize], std::back_inserter(result));

    return result;
}

ResamplerAudio::convert在使用以下参数编码之前调用函数:

// data - an array of raw pcm audio
// dataLength - the length of data array
// getSamplesCount() - function that calculates samples count
// frameEncode - AVFrame that using for encode audio
std::vector<uint8_t> resampledData = resamplerEncoder->convert(&data, getSamplesCount(dataLength, frameEncode->channels, AV_SAMPLE_FMT_S16), frameEncode->channels, frameEncode->format);

getSamplesCount()函数如下所示:

getSamplesCount(int bytesCount, int channels, AVSampleFormat format)
{
    return bytesCount / av_get_bytes_per_sample(format) / channels;
}

之后,我frameEncode用重新采样的样本填充我的:

memcpy(&frame->data[0][0], &resampledData[0], sizeof(uint8_t) * resampledDataLength);

并像这样传递frameEncode给编码encodeFrame(resampledDataLength)

void encodeFrame(int dataLength)
{
    /* send the frame for encoding */
    int ret = avcodec_send_frame(contextEncoder, frameEncode);
    if (ret < 0)
    {
        LOGE(TAG, "[encodeFrame] avcodec_send_frame error: %s", av_err2str(ret));
        return;
    }

    /* read all the available output packets (in general there may be any number of them */
    while (ret >= 0)
    {
        ret = avcodec_receive_packet(contextEncoder, packetEncode);
        if (ret < 0 && ret != AVERROR(EAGAIN)) LOGE(TAG, "[encodeFrame] error in avcodec_receive_packet: %s", av_err2str(ret));
        if (ret < 0) break;

        // encodedData - std::vector<uint8_t> that stores encoded data
        std::copy(&packetEncode->data[0], &packetEncode->data[dataLength], std::back_inserter(encodedData));
        av_packet_unref(packetEncode);
    }
}

然后我解码我的编码样本并重新采样,以源样本格式和采样率取回它们,所以我使用以下参数调用ResamplerAudio::convert函数:resamplerDecoder

// frameDecode - AVFrame that holds decoded audio
std::vector<uint8_t> resampledData = resamplerDecoder->convert(frameDecode->data, frameDecode->nb_samples, frameDecode->channels, AV_SAMPLE_FMT_S16);

结果声音断断续续,我还注意到解码后的数组大小大于原始 pcm 音频的源数组大小。

请任何想法我做错了什么?

2020 年 5 月 18 日更新

我测试了我的重采样逻辑,我在没有任何编码和解码例程的情况下对原始 pcm 声音进行了重采样。首先,我尝试将输入声音的采样率从 8000 Hz 转换为 48000 Hz,而不是从上述步骤中重新采样,并将其采样率从 48000 Hz 转换为 8000 Hz,结果声音完美干净,我也做了同样的事情步骤,但我不是将采样率而是将采样格式从 AV_SAMPLE_FMT_S16 转换为 AV_SAMPLE_FMT_FLTP,反之亦然,结果声音又完美又干净,当我同时转换采样率和采样格式时,我得到了相同的结果。所以我假设声音失真和断断续续的问题出在我的编码或解码程序中,我认为最有可能在解码程序中,因为解码后我总是获得 960 nb_samples 的 AVFrame,尽管输入声音的大小是多少。

我的解码例程如下所示:

std::vector<uint8_t> decode(uint8_t *data, unsigned int dataLength)
{
    decodedData.clear();

    int dataSize = dataLength;

    while (dataSize > 0)
    {
        if (!frameDecode)
        {
            frameDecode = av_frame_alloc();
            if (!frameDecode)
            {
                LOGE(TAG, "[decode] Couldn't allocate the frame");
                return EMPTY_DATA;
            }
        }

        ret = av_parser_parse2(parser, contextDecoder, &packetDecode->data, &packetDecode->size, &data[0], dataSize, AV_NOPTS_VALUE, AV_NOPTS_VALUE, 0);
        if (ret < 0) {
            LOGE(TAG, "[decode] av_parser_parse2 error: %s", av_err2str(ret));
            return EMPTY_DATA;
        }

        data += ret;
        dataSize -= ret;

        doDecode();
    }
    return decodedData;
}

void doDecode()
{
    if (packetDecode->size) {
        /* send the packet with the compressed data to the decoder */
        int ret = avcodec_send_packet(contextDecoder, packetDecode);
        if (ret < 0) LOGE(TAG, "[decode] avcodec_send_packet error: %s", av_err2str(ret));

        /* read all the output frames (in general there may be any number of them */
        while (ret >= 0)
        {
            ret = avcodec_receive_frame(contextDecoder, frameDecode);
            if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF) LOGE(TAG, "[decode] avcodec_receive_frame error: %s", av_err2str(ret));
            if (ret < 0) break;

            std::vector<uint8_t> resampledData = resamplerDecoder->convert(frameDecode->data, frameDecode->nb_samples, frameDecode->channels, AV_SAMPLE_FMT_S16);
            if (!resampledData.size()) continue;
            std::copy(&resampledData.data()[0], &resampledData.data()[resampledData.size()], std::back_inserter(decodedData));
        }
    }
}

2020 年 5 月 30 日更新

我决定拒绝在我的项目中使用 FFmpeg 并改用libopus 1.3.1,所以我做了一个包装器,它工作正常。

标签: c++ffmpegresamplingopus

解决方案


推荐阅读