google-cloud-platform - Google Speech API 的最佳采样率是多少?有 Google 员工或专家可以发表评论吗?
问题描述
到目前为止,我已经测试了一个非常小的 16 kHz 和 48 kHz 的音频文件。我很想进行更大规模的测试,但正如你所知,这需要花钱。
48 kHz 采样率提供了更好的结果。但是,在文档中说最好是 16 kHz
所以我有点困惑
flac
这里是我用 Google Speech to Text API 测试的 16 kHz 和 48 kHz文件
16 kHz:https ://drive.google.com/file/d/1MbiW3t86W68ZqENtDqD4XdNmEV7QZbZA/view?usp=sharing
48 kHz:https ://drive.google.com/file/d/1aLN1ptMJBwuYc6FdAk6CxcK1Ex4jI3vh/view?usp=sharing
在这里制作的成绩单
16kHz
Hello, dear students.
Welcome to the lecture 1 of introduction to programming course.
In this course, you will learn how to program you will learn the fundamentals of programming. You will learn how to be a software engineer. This course is the primary the most important cause of your Carriage. Why is that because in this course you will you will learn how to do
Programming haftar called how to compose a software. So this is your most important lesson among all of the courses you are going to take because this lesson will teach you how to program.
okay, so if you want to be a good programmer a good software engineer you have to
Perfect.
This course you have to give your most attention to this.
48kHz
Hello, dear students.
Welcome to the lecture 1 of introduction to programming course.
In this course, you will learn how to program you will learn the fundamentals of programming. You will learn how to be a software engineer. This course is the primary the most important course of your Carriage. Why is that because in this course you will you will learn how to do
Programming how to code how to compose a software. So this is your most important lesson.
Among all of the courses you are going to take because these lesson will teach you how to program.
okay, so if you want to be a good programmer a good software engineer you have to
Perfect.
This course you have to give your most attention to this.
视频的原始采样率为 48 kHz
那么任何专家或员工都可以对此发表评论吗?
这些是我用来ffmpeg
组成flac
文件的 16 kHz 和 48 kHz 命令
-af aformat=s16:16000:mono
-af aformat=s16:48000:mono
解决方案
16 kHz 只是用于转录 Speech-to-Text 的推荐采样率。1
我们建议您使用 Speech-to-Text 转录的音频文件中的采样率至少为 16 kHz。音频文件中的采样率通常为 16 kHz、32 kHz、44.1 kHz 和 48 kHz。因为清晰度受频率范围的影响很大,尤其是在较高频率中,低于 16 kHz 的采样率会导致音频文件在 8 kHz 以上时几乎没有信息或没有信息。这可能会阻止 Speech-to-Text 正确转录语音。语音清晰度需要整个 2 kHz 至 4 kHz 范围内的信息,尽管这些频率在较高范围内的谐波(倍数)对于保持语音清晰度也很重要。因此,将采样率保持在最低 16 kHz 是一个很好的做法。
推荐阅读
- amazon - 如何使用 Amazon Selling Partner Api 更改我的产品图片?
- woocommerce - 如果通过自定义脚本下订单,则无法完成 Woocommerce 付款
- spring-boot - Resilience4J Kotlin 协程定义回退方法
- javascript - 在JS中使用箭头函数之外的变量
- python-3.x - 如何使用 Anaconda Navigator 进行自动化测试
- sql - 当我们有一个列表时 SQL Server 中的 Json 值
- swift - 如何知道何时使用实时 Firebase 数据库中的侦听器从根节点获取所有初始数据
- javascript - 加载资源失败(找不到文件) Electron js 无法导入或导出
- spring-boot - Spring MVC 可编辑表(Thymeleaf)
- python - 在特定位置在熊猫中添加新列