首页 > 解决方案 > 在 cloudwatch 中提取 API 请求日志

问题描述

我正在寻找在 Cloudwatch 中提取以下有效负载中的每个字段的方法。此有效负载不是正确的 json 格式。有什么建议可以将其转换为正确的 json 格式吗?或者使用正则表达式将每个字段存储在单个变量中?

log = "{Date=Wed, 03 Mar 2021 01:33:41 GMT, Content-Type=application/json, Content-Length=11841, Connection=keep-alive, x-amzn-RequestId=427382d7-1234-5678-1234-a2022c4d0796, x-amzn-Remapped-Content-Length=0, X-Amz-Executed-Version=$LATEST, X-Amzn-Trace-Id=root=1-123ee774-1234c16635364cda21e42155;sampled=0}"

哪个理想的输出:

{"Date": "Wed, 03 Mar 2021 01:33:41 GMT", "Content-Type": "application/json", "Content-Length": 11841, "Connection": "keep-alive", "x-amzn-RequestId": "427382d7-1234-5678-1234-a2022c4d0796", "x-amzn-Remapped-Content-Length": 0, "X-Amz-Executed-Version": "$LATEST", "X-Amzn-Trace-Id": "root=1-123ee774-1234c16635364cda21e42155;sampled=0"}

如果不能直接转换,我想为每个字段使用正则表达式,例如

request_id = re.search(r'\bx-amzn-RequestId:\s+(\w+(?:-\w+)+)\s+', log).group(1)

标签: pythonregexstringtransformetl

解决方案


因为 YAML 是 JSON 的超集,你可以这样做:

import yaml
log = "{Date=Wed, 03 Mar 2021 01:33:41 GMT, Content-Type=application/json, Content-Length=11841, Connection=keep-alive, x-amzn-RequestId=427382d7-1234-5678-1234-a2022c4d0796, x-amzn-Remapped-Content-Length=0, X-Amz-Executed-Version=$LATEST, X-Amzn-Trace-Id=root=1-123ee774-1234c16635364cda21e42155;sampled=0}"
log = log.replace(","," ",1) #Replace first occurrence of a , in a string (Wed, 03 Mar 2021 01:33:41 GMT is Wed 03 Mar 2021 01:33:41 GMT)
json_data = yaml.load(log)
final_dict = {}
for key in json_data.keys():
    split_data= str(key).split("=")
    final_dict.update({split_data[0]: split_data[1]})
print(final_dict)

推荐阅读