python - 使用 Python 语音识别时的说话人分类
问题描述
在 Python 中使用 import speech_recognition 时是否可以选择对输出进行分类?
我会很感激这方面的建议,或者是否有可能。
此外,任何关于在文本文件中输出此信息的建议,每个新扬声器之间都有行,我们将不胜感激。
import speech_recognition as sr
from os import path
from pprint import pprint
audio_file = path.join(path.dirname(path.realpath(__file__)), "RobertP.wav")
r = sr.Recognizer()
with sr.AudioFile(audio_file) as source:
audio = r.record(source)
try:
txt = r.recognize_google(audio, show_all=True)
except:
print("Didn't work.")
text = str(txt)
f = open("tester.txt", "w+")
f.write(text)
f.close()
注意:为我的新手道歉。
解决方案
说话人分类目前在 Google Speech-to-Text API 中处于测试阶段。您可以在此处找到此功能的文档。可以通过多种方式对输出进行处理。以下是一个示例(基于这篇Medium 文章):
import io
def transcribe_file_with_diarization(speech_file):
“””Transcribe the given audio file synchronously with diarization.”””
from google.cloud import speech_v1p1beta1 as speech
client = speech.SpeechClient()
with io.open(speech_file, ‘rb’) as audio_file:
content = audio_file.read()
audio = {"content": content}
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16
sample_rate_hertz=48000
language_code=’en-US’
enable_speaker_diarization=True
enable_automatic_punctuation=True
diarization_speaker_count=4
config = {
"encoding": encoding,
"sample_rate_hertz": sample_rate_hertz,
"language_code": language_code,
"enable_speaker_diarization": enable_speaker_diarization,
"enable_automatic_punctuation": enable_automatic_punctuation,
# Optional:
"diarization_speaker_count": diarization_speaker_count
}
print(‘Waiting for operation to complete…’)
response = client.recognize(config, audio)
# The transcript within each result is separate and sequential per result.
# However, the words list within an alternative includes all the words
# from all the results thus far. Thus, to get all the words with speaker
# tags, you only have to take the words list from the last result:
result = response.results[-1]
words_info = result.alternatives[0].words
speaker1_transcript=””
speaker2_transcript=””
speaker3_transcript=””
speaker4_transcript=””
# Printing out the output:
for word_info in words_info:
if(word_info.speaker_tag==1):
speaker1_transcript=speaker1_transcript+word_info.word+’ ‘
if(word_info.speaker_tag==2):
speaker2_transcript=speaker2_transcript+word_info.word+’ ‘
if(word_info.speaker_tag==3):
speaker3_transcript=speaker3_transcript+word_info.word+’ ‘
if(word_info.speaker_tag==4):
speaker4_transcript=speaker4_transcript+word_info.word+’ ‘
print(“speaker1: ‘{}’”.format(speaker1_transcript))
print(“speaker2: ‘{}’”.format(speaker2_transcript))
print(“speaker3: ‘{}’”.format(speaker3_transcript))
print(“speaker4: ‘{}’”.format(speaker4_transcript))
推荐阅读
- c# - 如何根据 LinQ 查询中的输入使用不同的构造函数?
- android - 即使在处理compositeDisposable onDestroy之后,也会调用RxJava2 Android onNext
- azure - Application Insights 事件中的最大消息大小是多少?
- python - 可以在 Docker 中运行服务器但不能访问它(windows DockerTollbox)
- .net-core - .net core 2.2 或一些代码示例中 HttpContext.Current.Session[key] 的替代品是什么
- javascript - 什么是没有外大括号的 JavaScript 对象?
- javascript - jVectorMap 区域标签重叠
- php - 使用 PHP 创建一个从另一个 api 获取某些数据的 API 服务器(示例:api.themoviedb.org)
- react-native - 如何在函数内部使用 react-native 的 setTimeout?
- python - 如何在 Python 中获取时间戳的纪元秒数?