首页 > 解决方案 > Google Speech API 的最佳采样率是多少?有 Google 员工或专家可以发表评论吗?

问题描述

到目前为止,我已经测试了一个非常小的 16 kHz 和 48 kHz 的音频文件。我很想进行更大规模的测试,但正如你所知,这需要花钱。

48 kHz 采样率提供了更好的结果。但是,在文档中说最好是 16 kHz

所以我有点困惑

flac这里是我用 Google Speech to Text API 测试的 16 kHz 和 48 kHz文件

16 kHz:https ://drive.google.com/file/d/1MbiW3t86W68ZqENtDqD4XdNmEV7QZbZA/view?usp=sharing

48 kHz:https ://drive.google.com/file/d/1aLN1ptMJBwuYc6FdAk6CxcK1Ex4jI3vh/view?usp=sharing

在这里制作的成绩单

16kHz

Hello, dear students.

 Welcome to the lecture 1 of introduction to programming course.

 In this course, you will learn how to program you will learn the fundamentals of programming. You will learn how to be a software engineer. This course is the primary the most important cause of your Carriage. Why is that because in this course you will you will learn how to do

 Programming haftar called how to compose a software. So this is your most important lesson among all of the courses you are going to take because this lesson will teach you how to program.

 okay, so if you want to be a good programmer a good software engineer you have to

 Perfect.

 This course you have to give your most attention to this.

48kHz

Hello, dear students.

 Welcome to the lecture 1 of introduction to programming course.

 In this course, you will learn how to program you will learn the fundamentals of programming. You will learn how to be a software engineer. This course is the primary the most important course of your Carriage. Why is that because in this course you will you will learn how to do

 Programming how to code how to compose a software. So this is your most important lesson.

 Among all of the courses you are going to take because these lesson will teach you how to program.

 okay, so if you want to be a good programmer a good software engineer you have to

 Perfect.

 This course you have to give your most attention to this.

视频的原始采样率为 48 kHz

那么任何专家或员工都可以对此发表评论吗?

这些是我用来ffmpeg组成flac文件的 16 kHz 和 48 kHz 命令

-af aformat=s16:16000:mono
-af aformat=s16:48000:mono

标签: google-cloud-platformgoogle-apispeech-to-textgoogle-speech-apigoogle-speech-to-text-api

解决方案


16 kHz 只是用于转录 Speech-to-Text 的推荐采样率。1

我们建议您使用 Speech-to-Text 转录的音频文件中的采样率至少为 16 kHz。音频文件中的采样率通常为 16 kHz、32 kHz、44.1 kHz 和 48 kHz。因为清晰度受频率范围的影响很大,尤其是在较高频率中,低于 16 kHz 的采样率会导致音频文件在 8 kHz 以上时几乎没有信息或没有信息。这可能会阻止 Speech-to-Text 正确转录语音。语音清晰度需要整个 2 kHz 至 4 kHz 范围内的信息,尽管这些频率在较高范围内的谐波(倍数)对于保持语音清晰度也很重要。因此,将采样率保持在最低 16 kHz 是一个很好的做法。


推荐阅读