python - 在语音识别 API 中使用语音区分结果
问题描述
我正在尝试更多地了解语音区分和语音识别。我开始遵循本教程,并且能够获得音频标签的小管。
根据教程,您可以使用谷歌语音 API 并将音频片段发送到谷歌 API,它会被转录,这正是我坚持的地方!
根据教程你所要做的就是
- 获取 Google /Ibm watson API 语音转文本(完成)
(我已经完成了这一步并获得了 Watson API 密钥和 url!)
1.对于标签文件中的每个元组元素'ele',提取ele[0]作为说话者标签,ele 1作为开始时间,ele[2]作为结束时间。
(我根本不明白这一步......我试过这个,但我不确定这是否是他们的意思)
for ele in labelling:
speaker_label = ele[0]
start_time = ele[1]
end_time=ele[2]
2.从开始时间到结束时间修剪您的原始音频文件。您可以使用 ffmpeg 执行此任务。
(此步骤取决于步骤 1,但我也不了解它,因为我不知道如何使用 ffmpeg 或如何将其用于该项目)
3. 将上一步中获得的修剪后的音频文件传递给 Google 的 API/Ibm watson API,它将返回此音频片段的文本转录本。
(我只需要了解上下文或如何传递分段音频的代码以及它的外观)
4.将成绩单连同演讲者标签一起写入文本文件并保存。
任何帮助将不胜感激!
我的完整代码:
from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path
from resemblyzer.audio import sampling_rate
from spectralcluster import SpectralClusterer
import ffmpeg
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
# Ibm related components (Not used as it's not implemented )
authenticator = IAMAuthenticator('Key here')
speech_to_text = SpeechToTextV1(
authenticator=authenticator
)
speech_to_text.set_service_url(
'URL HERE')
#-------------------------------------------------------
#From the tutorial this part is to get the audio file and to process it
# give the file path to your audio file
audio_file_path = 'Audio files/testForTheOthers.wav'
wav_fpath = Path(audio_file_path)
wav = preprocess_wav(wav_fpath)
encoder = VoiceEncoder("cpu")
_, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
print(cont_embeds.shape)
#-----------------------------------------------------------------------
#From the tutorial this is the clustering part
#(some parts of the code got me error that is why they are not included)
# (p_percentile=0.90,gaussian_blur_sigma=1) got removed (Errors)
clusterer = SpectralClusterer(
min_clusters=2,
max_clusters=100,
)
labels = clusterer.predict(cont_embeds)
#-----------------------------------------------------------------------
#From the tutorial this is the clustering part
def create_labelling(labels, wav_splits):
from resemblyzer.audio import sampling_rate
times = [((s.start + s.stop) / 2) / sampling_rate for s in wav_splits]
labelling = []
start_time = 0
for i, time in enumerate(times):
if i > 0 and labels[i] != labels[i - 1]:
temp = [str(labels[i - 1]), start_time, time]
labelling.append(tuple(temp))
start_time = time
if i == len(times) - 1:
temp = [str(labels[i]), start_time, time]
labelling.append(tuple(temp))
return labelling
labelling = create_labelling(labels, wav_splits)
print(labelling)
#----------------------
#Me Trying to implement step 1
for ele in labelling:
speaker_label = ele[0]
start_time = ele[1]
end_time=ele[2]
#-----------------------------------------------------------------------------
#After this part you are supposed to implement the rest of the tutorial
#but I'm stuck
解决方案
推荐阅读
- spring - Spring boot Kafka - 对 Avro 对象序列化和用例的困惑
- javascript - react useMemo 是否比模块级变量更高效?
- javascript - 在单击 href 时显示文本
- c++ - 静态成员和分离线程本身的释放顺序
- javascript - 将 prop 作为字符串传递,然后将字符串转换回 javascript
- macos - 在 Python 3.9.1 中导入 SciPy 会产生 zsh: bus error; 苹果硅 M1 Mac OS 11
- google-cloud-storage - 如何强制 Google 存储文件具有“内容处置:内联”而不是“附件”?
- linux - apache 重定向 /url1/* 到 /url1/index.html
- javascript - javascript 将文档保存到 .html
- python - 如何在 HTML 中没有数据的站点中查找 REST API 参数?