首页 > 解决方案 > 检测到当前字母后在 Python 中检查下一个字母的方法?

问题描述

May  1 00:00:00 date=2018-04-30 time=23:59:59 dev=A devid=1234 msg="test 1"
May  1 00:00:00 date=2018-04-31 time=00:00:01 dev=A devid=1234 msg="test 2"

上面是一个日志文件的示例,我试图通过逐个字母检查=并将其保存为一行中的列值来将其转换为 csv。

=如果后面的值不是字符串,我设法捕获了 columnValue 。下面是提取值的部分代码。行的一部分在 之后=,有一个字符串,其间有空格。这破坏了提取物以开始新的发现。是否可以检查下一个字母"\"",然后开始逐个字母保存直到下一个字母,"\""以便我可以将列值保存为字符串?

我正在使用 python 2.7

def outputCSV(log_file_path, outputCSVName, colValueSet):
    data = []
    f = open(log_file_path, "r")
    values = set() # create empty set for all column values
    content = f.readlines()
    content = [x.strip() for x in content] #List of lines to iterate through
    colValueSet.add("postingDate")
    for line in content:
        new_dict = dict.fromkeys(colValueSet, "")
        new_dict["postingDate"]= line[0:16]
        findingColHeader = True # we have to find the columns first
        findingColValue = False # After column found, starting finding values
        col_value = "" # Empty at first
        value = "" # Empty value at first
        start = False
        for letter in line:
            if findingColHeader:
                if letter == " ":
                    # space means start taking in new value
                    # data is in this structure with space prior to column names -> " column=value"
                    start = True
                    col_value = ""
                elif letter == "=":
                    findingColValue = True
                    start = False
                    findingColHeader = False
                elif start:
                    col_value += letter
            elif findingColValue:
                if letter == " ":
                    new_dict[col_value] = value
                    value = ""
                    col_value = ""
                    findingColHeader = True
                    start = True
                    findingColValue = False
                else:
                    value += letter
        data += [new_dict]
    with open(outputCSVName, 'wb') as csvfile:
        fieldnames = list(colValueSet)
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        for row in data:
            writer.writerow(row)
    print("Writing Complete")

# findColumnValues(a) would calculate all column value from the file path
outputCSV("ttest.log", "MyProcessedLog.csv", findColumnValues("test.log"))

标签: pythonparsing

解决方案


你可以尝试这样的事情:

>>> a = 'May 1 00:00:00 date=2018-04-30 time=23:59:59 dev=A devid=1234 msg="test 1" '
>>> a.split('=')
['May 1 00:00:00 date', '2018-04-30 time', '23:59:59 dev', 'A devid', '1234 msg', '"test 1" ']
>>> parts = a.split('=')
>>> b = []
>>> for i,j in zip(parts, parts[1:]) :
...     b.append( (i[i.rfind(' ')+1:], j[:j.rfind(' ')]) )
... 
>>> b
[('date', '2018-04-30'), ('time', '23:59:59'), ('dev', 'A'), ('devid', '1234'), ('msg', '"test 1"')]
>>> 

我可以做一个可爱的单线,但我认为这样对你来说更容易理解,当你看到所有的中间结果并能掌握主要思想——在=标志处分割线,使用最后一个单词作为关键字,然后休息为价值。


推荐阅读