首页 > 解决方案 > FFMPEG loudnorm 过滤器不能与消音过滤器结合使用

问题描述

我想为 TTS 模型训练一致地标准化音频文件。输出的音频文件应满足以下条件:

  1. 单声道
  2. 22050赫兹的采样率
  3. wav 格式
  4. 音频剪辑的开头和结尾没有静音
  5. -24 分贝的音量

我已经满足了前 4 个标准。到目前为止,它工作正常。

使用此 ffmpeg 命令标准化音量基本上也可以正常工作-af loudnorm=I=-24:LRA=11:TP=-1.5,但不能与消除静音结合使用:一旦我使用此 ffmpeg 命令消除静音agate=threshold=0.045:attack=0.5:release=500:ratio=5000,silenceremove=start_periods=1:start_threshold=0.0075,areverse,silenceremove=start_periods=1:start_threshold=0.0075,areverse,响度标准化不再起作用:输出音量现在在 -25dB 和-32dB 而不是所需的 -24 dB。

这是我使用的完整 ffmpeg 命令

ffmpeg -i filename.flac -ac 1 -af agate=threshold=0.045:attack=0.5:release=500:ratio=5000,silenceremove=start_periods=1:start_threshold=0.0075,areverse,silenceremove=start_periods=1:start_threshold=0.0075,areverse,loudnorm=I=-24:LRA=11:TP=-1.5,aresample=22050 -y -hide_banner filename.wav

这是我用来运行它的一段代码:

import os

INPUT_DIR = '/home/username/all_data'
OUTPUT_DIR = '/home/username/normalized_data'
for filename in os.listdir(INPUT_DIR):
    wav_filename = filename[:-5] + '.wav'
    command = (f'ffmpeg -i {INPUT_DIR}/{filename} -ac 1 -af agate='
               f'threshold=0.045:attack=0.5:release=500:ratio=5000,'
               f'silenceremove=start_periods=1:start_threshold=0.0075,'
               f'areverse,silenceremove=start_periods=1:start_threshold='
               f'0.0075,areverse,loudnorm=I=-24:LRA=11:TP=-1.5,aresample'
               f'=22050 -y -hide_banner {OUTPUT_DIR}/{wav_filename}')
    os.system(command)

编辑:

可以在此处查看 ffmpeg 命令的完整日志:

username@pop-os:~$ ffmpeg -i /home/username/audios/filename.flac -ac 1 -af agate=threshold=0.045:attack=0.5:release=500:ratio=5000,silenceremove=start_periods=1:start_threshold=0.0075,areverse,silenceremove=start_periods=1:start_threshold=0.0075,areverse,loudnorm=I=-24:LRA=11:TP=-1.5,aresample=22050 /home/username/result.wav
ffmpeg version 4.2.4-1ubuntu0.1 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
  configuration: --prefix=/usr --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, flac, from '/home/mareike/tts_data/save/audios_flac/0a6c8520-7536-11eb-8338-b7015f354987.flac':
  Duration: 00:00:04.64, start: 0.000000, bitrate: 1090 kb/s
    Stream #0:0: Audio: flac, 44100 Hz, stereo, s32 (24 bit)
Stream mapping:
  Stream #0:0 -> #0:0 (flac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to '/home/mareike/result_0a6c8520-7536-11eb-8338-b7015f354987.wav':
  Metadata:
    ISFT            : Lavf58.29.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
    Metadata:
      encoder         : Lavc58.54.100 pcm_s16le
size=     138kB time=00:00:03.19 bitrate= 353.0kbits/s speed=14.3x    
video:0kB audio:138kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.055375%

谁能告诉我我做错了什么以及如何最终将音量标准化为-24 dB(结合消除静音)?任何帮助表示赞赏,非常感谢!

标签: pythonpython-3.xaudioffmpegnormalization

解决方案


推荐阅读