python - python字典迭代没有按预期工作
问题描述
我正在尝试使用转录数据修改 json 文件,以便将每个对话片段组合在一个句子中。
这是我输入数据的链接: https ://jsoneditoronline.org/#left=cloud.34cfd15f2c1f461f9e7a7ab57431de79
输出数据: https ://jsoneditoronline.org/#left=cloud.99a89b483ae84c7f8913da5ecfd3f4a3
我的目标是让输入数组中的每个项目结合所有对话项目,检查下一个项目是否来自新演讲者。如果是,我重置字符串,否则我添加新项目,直到扬声器更改或直到字符串增长到 500 个字符。出于某种原因,正如您在输出中看到的那样,我的数据不断重复。
这是我的代码:
import json
with open('input-data.json', 'r') as f:
text = json.load(f)
segment_string = ''
current_speaker = ''
sentimentData_full = {}
sentimentData_final = []
for item in text:
conversation_segment_list = item['conversation_items']
speaker = item['speaker_label']
for segment in conversation_segment_list:
if len(segment_string) >= 500 or speaker != current_speaker:
segment_string = ''
segment_string += f"{segment['content']} "
current_speaker = speaker
else:
segment_string += f"{segment['content']} "
continue
sentimentData = {}
sentimentData_full['speaker_label'] = speaker
sentimentData_full['segment_string'] = segment_string
sentimentData_final.append(sentimentData_full.copy())
sentimentData_full = {}
app_json = json.dumps(sentimentData_final)
with open('output-data.json', 'w') as f:
f.write(app_json)
我已经为此工作了很多小时,任何帮助将不胜感激。这里还有一个现在错误的例子(以防我的解释不够清楚):
当前,不正确的输出:
[
{
"speaker_label": "spk_0",
"segment_string": "mhm . "
},
{
"speaker_label": "spk_0",
"segment_string": "mhm . You have reached a as in so far . This is Donna . I'll be assisting you with your inquiries today . Please be informed that this call is being recorded and monitored for quality assurance purposes . How may I help you ? "
},
{
"speaker_label": "spk_1",
"segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , "
},
{
"speaker_label": "spk_1",
"segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , for shipping , handling and handling costs . 599 a sample of it . "
},
{
"speaker_label": "spk_1",
"segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , for shipping , handling and handling costs . 599 a sample of it . And , um , if I want to cancel the order , I had to do it within , "
}
]
预期输出:
{
"speaker_label": "spk_0",
"segment_string": "mhm . You have reached a as in so far . This is Donna . I'll be assisting you with your inquiries today . Please be informed that this call is being recorded and monitored for quality assurance purposes . How may I help you ? "
},
{
"speaker_label": "spk_1",
"segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , for shipping , handling and handling costs . 599 a sample of it . And , um , if I want to cancel the order , I had to do it within , "
}
]
解决方案
我无法调试您的问题,但我从头开始创建了一个。按键可以根据您的要求进行调整,但目前,我使用扬声器作为按键。
import json
import re
from collections import defaultdict
speakers = defaultdict(list)
with open("try.json", "r") as f:
text = json.load(f)
for item in text:
conversation_segment_list = item["conversation_items"]
speaker = item["speaker_label"]
for segment in conversation_segment_list:
speakers[speaker].append(segment["content"])
speakers = {
speaker: re.sub(r"[\s]([.,?])", r"\1", " ".join(words))[:500]
for speaker, words in speakers.items()
}
print(speakers)
输出:
{
"spk_0": "mhm. You have reached a as in so far. This is Donna. I'll be assisting you with your inquiries today. Please be informed that this call is being recorded and monitored for quality assurance purposes. How may I help you? Okay, I didn't have I'm happy to assist you. Um, for me to be able to pull up your, uh, subscription here, could you kind of provide me your first and your last name? Oh, l y and then Yes, l a k e. Yes. Okay. Just, uh, go ahead to pull up here. A subscription here or your account",
"spk_1": "Um, well, I bought. All right, I got this, um, essence of argon oil, um, for shipping, handling and handling costs. 599 a sample of it. And, um, if I want to cancel the order, I had to do it within, uh, 15 days. And so that is, um when I want I wanted to do, I didn't want to. I didn't want to get, you know, like a monthly for what is it at $3 a month? Okay, I can't afford that. I'm Carolyn. C a R O L Y N lake l a k e It's, uh, Lake 3921 at hotmail dot com. Uh, 3 10 Warren Avenue number 2 July Wy",
}
推荐阅读
- php - 如何在 json 响应中发送多个参数?(拉拉维尔)
- javascript - 使用 app.use('url','route_file') 对范围路由进行错误处理
- javascript - 我如何在 JavaScript 中使用 PHP 函数(Gutenberg WordPress)
- algorithm - 任意凸多边形中具有固定纵横比的最大对齐矩形?
- port - 来自互联网的端口转发工作不在同一网络内
- sql - 防止在 AWS Glue 中多次处理文件
- windows - 使用批处理脚本在多个文件上搜索文本,如果在其中找到文本,则重命名文件
- python - Pyinfra 缺少位置参数
- android - 以编程方式为 AnimatedVectorDrawable 和 Compat 设置持续时间和 startOffset
- javascript - 我不断收到 TypeError: Cannot read property 'find' of undefined