首页 > 解决方案 > 在python的日志文件中使用日期和计数匹配词进行分组

问题描述

我有带有以下数据的 myFileMonitor.log 文件。我想根据按日期分组并将数据保存在 csv 文件中,并匹配诸如 - 'Created', 'modified', 'moved', 'deleted' 之类的词。因此,在日志文件中,我想根据日期过滤数据,并且我想计算这些单词在日志中出现的次数。请协助。

myFileMonitor.log:
2020-09-25 16:31:58 - Security Alert! ' C:/Users/khond/Downloads/New folder ' has been Created!!
2020-09-25 16:32:11 - Security Alert! Files/Folder moved ' C:/Users/khond/Downloads/New folder ' to ' C:/Users/khond/Downloads/Test1 '
2020-09-25 16:32:12 - Security Alert! ' C:/Users/khond/Downloads/Test1 - Copy ' has been Created!!
2020-09-25 16:32:13 - Security Alert! ' C:/Users/khond/Downloads/Test1 - Copy ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/Code.png ' has been Created!!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/Code.png ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/Code.png ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/Code.png ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/Code.png ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/PsedoCode.png ' has been Created!!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/PsedoCode.png ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/PsedoCode.png ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/PsedoCode.png ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/PsedoCode.png ' has been modified!
2020-09-25 16:33:56 - Security Alert! Files/Folder deleted: C:/Users/khond/Downloads/Test1!
2020-09-25 16:34:04 - Security Alert! Files/Folder moved ' C:/Users/khond/Downloads/Test1 - Copy ' to ' C:/Users/khond/Downloads/Test1 '
2020-09-25 16:34:05 - Security Alert! Files/Folder deleted: C:/Users/khond/Downloads/Code.png!
2020-09-25 16:34:11 - Security Alert! Files/Folder moved ' C:/Users/khond/Downloads/PsedoCode.png ' to ' C:/Users/khond/Downloads/Code.png '
2020-09-25 16:34:11 - Security Alert! ' C:/Users/khond/Downloads/Code.png ' has been modified!
2020-09-30 19:02:45 - Security Alert! ' C:/Users/khond/Downloads/New folder ' has been Created!!
2020-09-30 19:02:52 - Security Alert! ' C:/Users/khond/Downloads/New folder - Copy ' has been Created!!
2020-09-30 19:02:53 - Security Alert! ' C:/Users/khond/Downloads/New folder - Copy ' has been modified!
2020-09-30 19:02:53 - Security Alert! ' C:/Users/khond/Downloads/New folder - Copy (2) ' has been Created!!
2020-09-30 19:02:54 - Security Alert! ' C:/Users/khond/Downloads/New folder - Copy (2) ' has been modified!
2020-09-30 19:02:55 - Security Alert! Files/folder deleted: C:/Users/khond/Downloads/New folder - Copy (2)!
2020-09-30 19:03:07 - Security Alert! ' C:/Users/khond/Downloads/New Rich Text Document.rtf ' has been Created!!
2020-09-30 19:03:07 - Security Alert! ' C:/Users/khond/Downloads/New Rich Text Document.rtf ' has been modified!
2020-09-30 19:03:13 - Security Alert! Files/folder deleted: C:/Users/khond/Downloads/New Rich Text Document.rtf!
2020-09-30 19:03:16 - Security Alert! Files/folder deleted: C:/Users/khond/Downloads/New folder - Copy!
I want the output as below:
2020-09-25,Created,4
2020-09-25,modified,9
2020-09-25,deleted,2
2020-09-25,moved,3
2020-09-30,Created,4
2020-09-30,modified,3
2020-09-30,deleted,3

我是python新手,尝试编写如下函数

def collect_data():
    try:
        file_name = 'myFileMonitor.log'
        with open(file_name) as f:
            contents = f.read()
            count_for_deleted = contents.count("deleted")
            count_for_created = contents.count("created")
            count_for_modified = contents.count("modified")
            count_for_moved = contents.count("moved")
        print(count_for_deleted)
        print(count_for_created)
        print(count_for_modified)
        print(count_for_moved)

        occurrences = defaultdict(lambda: defaultdict(int))

        with open('myFileMonitor.log', 'r') as f:
            for line in f.readlines():
                date = line.split(' ')[0]
                name = line.split(' - ')[1].split(': ')[0]
                occurrences[date][name] += 1

        for elem in occurrences:
            print(elem[0], ' :: ', elem[1])

            #print(occurrences)
    except FileNotFoundError:
        print("Exception error: File not found!")

标签: pythonpython-3.xgroup-bycount

解决方案


使用 defaultdict,您在正确的轨道上非常正确,但是有一种简单的方法可以检查键是否在一行中,那就是in关键字。如果您事先知道关键字是什么,那么这是我将使用的代码:

from collections import defaultdict
def collect_data():
    try:
        occurrences = defaultdict(lambda: defaultdict(int))
        keys = {'Created', 'modified', 'deleted', 'moved'}
        with open('myFileMonitor.log', 'r') as f:
            for line in f:
                date = line.split(' ')[0]
                for key in keys:
                    if key in line:
                        occurrences[date][key] += 1

        for date in occurrences:
            for key in occurrences[date]:
                print(date+','+key+','+str(occurrences[date][key]))

    except FileNotFoundError:
        print("Exception error: File not found!")

输出:

2020-09-25,Created,4
2020-09-25,moved,3
2020-09-25,modified,10
2020-09-25,deleted,2
2020-09-30,Created,4
2020-09-30,modified,3
2020-09-30,deleted,3

您还可以执行一些操作,例如定义要打印日期和键的顺序,或者在必要时在循环之前进行排序。


推荐阅读