首页 > 解决方案 > Python - 计算结构化日志文件

问题描述

给定一个日志字符串数组:

log = [
    '[WARNING] 403 Forbidden: No token in request parameters',
    '[ERROR] 500 Server  Error: int is not subscription',
    '[INFO] 200 OK: Login Successful',
    '[INFO] 200 OK: User sent a message',
    '[ERROR] 500 Server Error: int is not subscription'
]

我正在尝试在 python 中使用字典变得更好,并希望遍历这个数组并打印出如下内容:

{'WARNING': {'403': {'Forbidden': {'No token in request parameters': 1}}},
'ERROR': {'500': {'Server Error': {'int is not subscriptable': 2}}},
'INFO': {'200': {'OK': {'Login Successful': 1, 'User sent a message': 1}}}}

本质上,我想返回一个字典,其日志统计信息格式如上。我开始写出我的方法并写到这里:

def logInfo(logs):
    dct = {}

for log in logs:
    log = log.strip().split()
    if log[2] == "Server":
        log[2] = "Server Error:"
        log.remove(log[3])
    #print(log)
    joined = " ".join(log[3:])
    if log[0] not in dct:
        log[0] = log[0].strip('[').strip(']')
        dct[log[0]] = {}
        if log[1] not in dct[log[0]]:
            dct[log[0]][log[1]] = {}
            if log[2] not in dct[log[0]][log[1]]:
                dct[log[0]][log[1]][log[2]] = {}
                if joined not in dct:
                    dct[log[0]][log[1]][log[2]][joined] = 1
                else:
                    dct[log[0]][log[1]][log[2]][joined] += 1
            else:
                dct[joined].append(joined)
print(dct)

它改为打印:

{'WARNING': {'403': {'Forbidden:': {'No token in request parameters': 1}}}, 'ERROR': {'500': {'Server Error:': {'int is not subscription': 1}}}, 'INFO': {'200': {'OK:': {'User sent a message': 1}}}}

该方法本身也很长,任何人都可以提供帮助,或者提示我一种更熟练的处理方式吗?

标签: pythondictionarymultidimensional-array

解决方案


我浏览了您的代码。发现修复了一些错误,并且运行良好。

  • 首先不需要嵌套if,所以我将它if放在同一级别。因为当您测试dict是否有键时,如果不存在,则在键下给它一个空dict,所以如果有的话,下一个if会正常工作父键。
  • log[0] not in dct你之前做过测试strip('[').strip(']'),所以你总是会听到以前的数据,我修复它并将它指向代码下方
  • 我不知道你为什么要测试joined not in dct,你应该测试它dct[log[0]][log[1]][log[2]],我修复它并将它指向代码下方
def logInfo(logs):
    dct = {}

    for log in logs:
        log = log.strip().split()
        if log[2] == "Server":
            log[2] = "Server Error:"
            log.remove(log[3])
        #print(log)
        joined = " ".join(log[3:])

        log[0] = log[0].strip('[').strip(']')
        if log[0] not in dct:
            # this line should move to before in dct test
            # log[0] = log[0].strip('[').strip(']') 
            dct[log[0]] = {}
        if log[1] not in dct[log[0]]:
            dct[log[0]][log[1]] = {}
        if log[2] not in dct[log[0]][log[1]]:
            dct[log[0]][log[1]][log[2]] = {}
        # I did not know why test joined in the root dct
        # if joined not in dct:
        if joined not in dct[log[0]][log[1]][log[2]]:
            dct[log[0]][log[1]][log[2]][joined] = 1
        else:
            dct[log[0]][log[1]][log[2]][joined] += 1
    
    print(dct)

推荐阅读