python - 从音频文件中提取 db 频谱图,对频谱图进行去噪并将其转换回音频
问题描述
我正在尝试将一些图像处理技术应用于由音频文件创建的频谱图。在此示例中,我想对频谱图应用去噪算法,然后将其反演回音频。这将如何正确完成,以便我可以操纵频谱图,然后返回音频而不会丢失大部分信号的初始质量?显然我在这里做错了,所以任何帮助将不胜感激。
我使用了此处找到的部分代码: 如何将 Librosa 频谱图保存为特定尺寸的图像?
这是我正在处理的代码:
!pip install librosa --upgrade
import librosa
import matplotlib.pyplot as plt
import numpy as np
import librosa.display
from IPython.display import Audio,display
from scipy.io.wavfile import write
import skimage.io
from skimage.color import rgb2gray
import cv2
def scale_minmax(X, min=0.0, max=1.0):
X_std = (X - X.min()) / (X.max() - X.min())
X_scaled = X_std * (max - min) + min
return X_scaled
def spectrogram_image(y, sr, out, hop_length, n_mels):
stft = np.abs(librosa.stft(y=y,n_fft=hop_length*2, hop_length=hop_length))
amp2db = librosa.amplitude_to_db(stft, ref=np.max)
# min-max scale to fit inside 8-bit range
img = scale_minmax(amp2db, 0, 255).astype(numpy.uint8)
img = numpy.flip(img, axis=0) # put low frequencies at the bottom in image
img = 255-img # invert. make black==more energy
# save as PNG
skimage.io.imsave(out, img)
return img
if __name__ == '__main__':
# settings
hop_length = 512 # number of samples per time-step in spectrogram
n_mels = 128 # number of bins in spectrogram. Height of image
time_steps = 384 # number of time-steps. Width of image
# load audio. Using example from librosa
path = librosa.util.example_audio_file()
y, sr = librosa.load(path)
out = 'out.png'
# extract a fixed length window
start_sample = 0 # starting at beginning
length_samples = time_steps*hop_length
window = y[start_sample:start_sample+length_samples]
# convert to PNG
img_png= spectrogram_image(window, sr=sr, out=out, hop_length=hop_length, n_mels=n_mels)
print('wrote file', out)
converted_img = cv2.cvtColor(img_png, cv2.COLOR_GRAY2BGR)
dst = cv2.fastNlMeansDenoisingColored(converted_img,None,10,10,7,21)
dst=img_gray = rgb2gray(dst)
#dst = scale_minmax(dst, 0, 1.0).astype(numpy.float64)
dst = numpy.flip(dst, axis=0) # do i need this???
fig= plt.figure(figsize=(32,16))
plt.subplot(211),plt.imshow(img_png)
plt.subplot(212),plt.imshow(dst)
plt.show()
y=librosa.amplitude_to_db(dst)
y_hat = librosa.istft(y)
#y_hat = librosa.griffinlim(y)
audio1= Audio(y_hat,rate=sr)
display(audio1)
write("/content/XXX.wav", sr,y_hat)
解决方案
推荐阅读
- laravel - getCustomAttribute 中的 Laravel 方法返回 cullection null
- javascript - Highcharts Spiderweb 图表 xAxis 标签在长标签名称上消失
- javascript - 无法将引导 JS 加载到 Electron 应用程序中
- javascript - hibext_instdsigdipv2 cookie 来自哪里?
- c++ - 达到限制后自动旋转值的自定义 qt spinbox
- linux - 如果字符串的出现每行恰好一次,如何删除一行?
- embedded-linux - QEMU 网络和 getty 问题
- kubernetes - 如何使用附加到主机的块设备作为 pod 内的块设备
- java - Java WatchService 可以同时监听多少个目录?
- javascript - 是否可以确保内容脚本注入的代码首先运行