首页 > 解决方案 > 如何使用 Python 匹配音频剪辑中的音频剪辑?

问题描述

我正在尝试使用 Librosa 在较大的 mp3 音频剪辑中播放简短的 mp3 叮当声。但是,我很难让它工作,我不知道下一步该去哪里。这是我到目前为止基于此 StackOverflow 答案的代码,尽管我愿意通过另一种方法或库来检测叮当声的位置。

# Load the audio as a waveform
# Store the sampling rate

JingleWave, JingleSR = librosa.load(short.mp3)
EpisodeWave, EpisodeSR = librosa.load(long.mp3)

# Power spectrograms of file
# I notice through debugging that the length of these arrays are the same
# despite them being very different file lengths

JingleSpectogram = np.abs(librosa.stft(JingleWave))
EpisodeSpectogram = np.abs(librosa.stft(EpisodeWave))

# Define binary structure for the footprint
# This is the part that is most likely to be faulty, as I most did it because
# maximum filter requires a footprint

structure = generate_binary_structure(2,1)

# Find local peaks to create constellation maps (2D images only containing peaks)

JingleCM = maximum_filter(JingleSpectogram, footprint=structure)
EpisodeCM = maximum_filter(EpisodeSpectogram, footprint=structure)

# Get time frames of the constellation maps

JingleLength = JingleCM.shape[0]
EpisodeLength = EpisodeCM.shape[0]

# Keep track of what segments match the most

scores = []

# Compare audio to find matching audio

for offset in range(EpisodeLength-JingleLength):
    EpisodeExcerpt = EpisodeCM[offset:offset+JingleLength]
    score = np.sum(np.multiple(EpisodeExcerpt,JingleCM))
    scores[offset] = score

# Find when the highest score happens

highestScore = -1
for num in range(len(scores)):
    if highestScore < num:
        highestScore = num

# Convert score into the position of where the jingle starts
print(scores.index(highestScore))
print(highestScore)

我只是编程的初学者,所以非常感谢任何帮助。

标签: pythonnumpyaudioaudio-processinglibrosa

解决方案


推荐阅读