python - 如何从 Watson Speech-to-Text 输出重建对话?
问题描述
我有来自 Watson 的 Speech-to-Text 服务的 JSON 输出,我已将其转换为列表,然后转换为 Pandas 数据框。
我正在尝试确定如何重建对话(带有时间),类似于以下内容:
演讲者 0:说过这个 [00.01 - 00.12]
演讲者 1:说过 [00.12 - 00.22]
演讲者 0:说了点别的 [00.22 - 00.56]
我的数据框每个单词都有一行,单词的列、开始/结束时间和说话者标签(0 或 1)。
words = [['said', 0.01, 0.06, 0],['this', 0.06, 0.12, 0],['said', 0.12,
0.15, 1],['that', 0.15, 0.22, 1],['said', 0.22, 0.31, 0],['something',
0.31, 0.45, 0],['else', 0.45, 0.56, 0]]
理想情况下,我要创建的是以下内容,其中同一说话者所说的单词被组合在一起,并在下一个说话者介入时被打破:
grouped_words = [[['said','this'], 0.01, 0.12, 0],[['said','that'] 0.12,
0.22, 1],[['said','something','else'] 0.22, 0.56, 0]
更新:根据请求,获得的 JSON 文件示例的链接位于https://github.com/cookie1986/STT_test
解决方案
将扬声器标签加载到 Pandas Dataframe 中应该非常简单,以获得漂亮的简单图形视图,然后识别扬声器变化。
speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)
输出:
from speaker to 0 1 2
0 0.01 0 0.06 said 0.01 0.06
1 0.06 0 0.12 this 0.06 0.12
2 0.12 1 0.15 said 0.12 0.15
3 0.15 1 0.22 that 0.15 0.22
4 0.22 0 0.31 said 0.22 0.31
5 0.31 0 0.45 something 0.31 0.45
6 0.45 0 0.56 else 0.45 0.56
从那里,您可以只识别扬声器的变化并通过快速循环折叠数据框
ChangeSpeaker=speakers.loc[speakers['speaker'].shift()!=speakers['speaker']].index
Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
print(counter)
currentindex=ChangeSpeaker[counter]
try:
nextIndex=ChangeSpeaker[counter+1]-1
temp=speakers.loc[currentindex:nextIndex,:]
except:
temp=speakers.loc[currentindex:,:]
Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))
您想从第一个值(因此为头)获取起点,然后从临时数据帧中的最后一个值获取终点。此外,要处理最后一个扬声器案例(通常会出现数组越界错误,您可以使用 try/catch.
输出:
from to speaker transcript
0 0.01 0.12 0 [said, this]
0 0.12 0.22 1 [said, that]
0 0.22 0.56 0 [said, something, else]
完整代码在这里
import json
import pandas as pd
jsonconvo=json.loads("""{
"results": [
{
"alternatives": [
{
"timestamps": [
[
"said",
0.01,
0.06
],
[
"this",
0.06,
0.12
],
[
"said",
0.12,
0.15
],
[
"that",
0.15,
0.22
],
[
"said",
0.22,
0.31
],
[
"something",
0.31,
0.45
],
[
"else",
0.45,
0.56
]
],
"confidence": 0.85,
"transcript": "said this said that said something else "
}
],
"final": true
}
],
"result_index": 0,
"speaker_labels": [
{
"from": 0.01,
"to": 0.06,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.06,
"to": 0.12,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.12,
"to": 0.15,
"speaker": 1,
"confidence": 0.55,
"final": false
},
{
"from": 0.15,
"to": 0.22,
"speaker": 1,
"confidence": 0.55,
"final": false
},
{
"from": 0.22,
"to": 0.31,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.31,
"to": 0.45,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.45,
"to": 0.56,
"speaker": 0,
"confidence": 0.54,
"final": false
}
]
}""")
speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)
ChangeSpeaker=speakers.loc[speakers['speaker'].shift()!=speakers['speaker']].index
Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
print(counter)
currentindex=ChangeSpeaker[counter]
try:
nextIndex=ChangeSpeaker[counter+1]-1
temp=speakers.loc[currentindex:nextIndex,:]
except:
temp=speakers.loc[currentindex:,:]
Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))
推荐阅读
- python-3.x - websocket.exception.ConnectionClosedError: code 4000 (privat use) - Python Discord Bot
- ios - xCode 11.4 后 Azure iOS 管道失败
- java - Gradle 项目中的 Cucumber 事件处理
- css - 将自定义字体导入 SAPUI5 应用程序
- java - 将后端数据发送到 Shopify
- python - 将 excel 数据导入数据库 - 初学者
- node.js - 如何处理 Sematext 与监视重新启动的容器分离
- swift - 快速裁剪 UIImage 而不会损失质量
- c# - SQL delete 命令删除表中的所有条目
- objective-c - 如何检查可重入数据库队列?