首页 > 解决方案 > 读取和操作 Sagemaker Json 输出

问题描述

我将 HuggingFace Transformer 模型作为批处理部署在 Sagemaker 上。我的输出文件是一个 .jsonl.out 文件,如下所示:

{"SageMakerOutput":[{"label":"LABEL_8","score":0.9152628183364868}],"inputs":"test"}
{"SageMakerOutput":[{"label":"LABEL_8","score":0.9769203066825867}],"inputs":"Alles OK"}

现在的问题是我只想要以下输出:

LABEL_8, test
LABEL_8, Alles OK

并将其作为 .csv 或 .xlsx 返回!我尝试过类似的东西:

batch_transform_result = []
with open(output_file) as f:
    for line in f:
        # converts jsonline array to normal array
        line = "[" + line.replace("[","").replace("]","") + "]"
        batch_transform_result = literal_eval(line) 

并尝试添加更多 .replace() 函数来清理文本中的读取内容,但没有奏效。有什么建议么?

标签: pythonjson

解决方案


我认为以下内容可以为您工作

lst = [
    {"SageMakerOutput": [{"label": "LABEL_8", "score": 0.9152628183364868}], "inputs": "test"},
    {"SageMakerOutput": [{"label": "LABEL_8", "score": 0.9769203066825867}], "inputs": "Alles OK"}
]

result = [(entry['SageMakerOutput'][0]['label'],entry['inputs']) for entry in lst]

print(result)

输出

[('LABEL_8', 'test'), ('LABEL_8', 'Alles OK')]

推荐阅读