首页 > 解决方案 > 如何使用 IBM Speech to Text 进行扬声器分类?

问题描述

我正在尝试使用 IBM 语音转文本执行扬声器分类。我通过 API 发送我的音频文件,我得到如下 JSON 格式的结果。

{
  "results": [
    {
      "alternatives": [
        {
          "timestamps": [
            [
              "hello",
              0.68,
              1.19
            ],
            [
              "yeah",
              1.47,
              1.91
            ],
            [
              "yeah",
              1.96,
              2.12
            ],
            [
              "how's",
              2.12,
              2.59
            ],
            [
              "Billy",
              2.59,
              3.17
            ],
            [
              "good",
              4.01,
              4.30
            ]
          ]
          "confidence": 0.82,
          "transcript": "hello yeah yeah how's Billy good "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0,
  "speaker_labels": [
    {
      "from": 0.68,
      "to": 1.19,
      "speaker": 2,
      "confidence": 0.52,
      "final": false
    },
    {
      "from": 1.47,
      "to": 1.93,
      "speaker": 1,
      "confidence": 0.62,
      "final": false
    },
    {
      "from": 1.96,
      "to": 2.12,
      "speaker": 2,
      "confidence": 0.51,
      "final": false
    },
    {
      "from": 2.12,
      "to": 2.59,
      "speaker": 2,
      "confidence": 0.51,
      "final": false
    },
    {
      "from": 2.59,
      "to": 3.17,
      "speaker": 2,
      "confidence": 0.51,
      "final": false
    },
    {
      "from": 4.01,
      "to": 4.30,
      "speaker": 1,
      "confidence": 0.63,
      "final": true
    }
  ]
}

但我想要这种格式->

Speaker 2 - "Hello?"
Speaker 1 - "Yeah?"
Speaker 2 - "Yeah, how's Billy?"
Speaker 1 - "Good."

有什么方法可以让我得到这种格式的结果,还是我必须编写自己的代码?这是我的代码:

with open('/content/test.mp3','rb') as audio_file:
    speech_recognition_results = speech_to_text.recognize(
        audio=audio_file,
        content_type='audio/mp3',
        word_alternatives_threshold=0.9,
        speaker_labels = True
    ).get_result()
print(json.dumps(speech_recognition_results, indent=2))

标签: pythonmachine-learningnlpibm-cloudspeech-to-text

解决方案


推荐阅读