首页 > 解决方案 > python字典迭代没有按预期工作

问题描述

我正在尝试使用转录数据修改 json 文件,以便将每个对话片段组合在一个句子中。

这是我输入数据的链接: https ://jsoneditoronline.org/#left=cloud.34cfd15f2c1f461f9e7a7ab57431de79

输出数据: https ://jsoneditoronline.org/#left=cloud.99a89b483ae84c7f8913da5ecfd3f4a3

我的目标是让输入数组中的每个项目结合所有对话项目,检查下一个项目是否来自新演讲者。如果是,我重置字符串,否则我添加新项目,直到扬声器更改或直到字符串增长到 500 个字符。出于某种原因,正如您在输出中看到的那样,我的数据不断重复。

这是我的代码:

import json



with open('input-data.json', 'r') as f:
    text = json.load(f)
    
segment_string = ''
current_speaker = ''
sentimentData_full = {}
sentimentData_final = []
for item in text: 
    conversation_segment_list = item['conversation_items']
    speaker = item['speaker_label']
    for segment in conversation_segment_list:
        if len(segment_string) >= 500 or speaker != current_speaker:     
            segment_string = ''
            segment_string += f"{segment['content']} "
            current_speaker = speaker
        else:
            segment_string += f"{segment['content']} "
            continue
    sentimentData = {}
    sentimentData_full['speaker_label'] = speaker
    sentimentData_full['segment_string'] = segment_string                
    sentimentData_final.append(sentimentData_full.copy())
    sentimentData_full = {}       

app_json = json.dumps(sentimentData_final)
with open('output-data.json', 'w') as f:
    f.write(app_json)

我已经为此工作了很多小时,任何帮助将不胜感激。这里还有一个现在错误的例子(以防我的解释不够清楚):

当前,不正确的输出:

[
  {
    "speaker_label": "spk_0",
    "segment_string": "mhm . "
  },
  {
    "speaker_label": "spk_0",
    "segment_string": "mhm . You have reached a as in so far . This is Donna . I'll be assisting you with your inquiries today . Please be informed that this call is being recorded and monitored for quality assurance purposes . How may I help you ? "
  },
  {
    "speaker_label": "spk_1",
    "segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , "
  },
  {
    "speaker_label": "spk_1",
    "segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , for shipping , handling and handling costs . 599 a sample of it . "
  },
  {
    "speaker_label": "spk_1",
    "segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , for shipping , handling and handling costs . 599 a sample of it . And , um , if I want to cancel the order , I had to do it within , "
  } 
]

预期输出:

{
    "speaker_label": "spk_0",
    "segment_string": "mhm . You have reached a as in so far . This is Donna . I'll be assisting you with your inquiries today . Please be informed that this call is being recorded and monitored for quality assurance purposes . How may I help you ? "
  },
  {
    "speaker_label": "spk_1",
    "segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , for shipping , handling and handling costs . 599 a sample of it . And , um , if I want to cancel the order , I had to do it within , "
  } 
]

标签: pythonjsondictionary

解决方案


我无法调试您的问题,但我从头开始创建了一个。按键可以根据您的要求进行调整,但目前,我使用扬声器作为按键。

import json
import re
from collections import defaultdict

speakers = defaultdict(list)

with open("try.json", "r") as f:
    text = json.load(f)

for item in text:
    conversation_segment_list = item["conversation_items"]
    speaker = item["speaker_label"]
    for segment in conversation_segment_list:
        speakers[speaker].append(segment["content"])

speakers = {
    speaker: re.sub(r"[\s]([.,?])", r"\1", " ".join(words))[:500]
    for speaker, words in speakers.items()
}

print(speakers)

输出:

{
    "spk_0": "mhm. You have reached a as in so far. This is Donna. I'll be assisting you with your inquiries today. Please be informed that this call is being recorded and monitored for quality assurance purposes. How may I help you? Okay, I didn't have I'm happy to assist you. Um, for me to be able to pull up your, uh, subscription here, could you kind of provide me your first and your last name? Oh, l y and then Yes, l a k e. Yes. Okay. Just, uh, go ahead to pull up here. A subscription here or your account",
    "spk_1": "Um, well, I bought. All right, I got this, um, essence of argon oil, um, for shipping, handling and handling costs. 599 a sample of it. And, um, if I want to cancel the order, I had to do it within, uh, 15 days. And so that is, um when I want I wanted to do, I didn't want to. I didn't want to get, you know, like a monthly for what is it at $3 a month? Okay, I can't afford that. I'm Carolyn. C a R O L Y N lake l a k e It's, uh, Lake 3921 at hotmail dot com. Uh, 3 10 Warren Avenue number 2 July Wy",
}

推荐阅读