python - 在python的日志文件中使用日期和计数匹配词进行分组
问题描述
我有带有以下数据的 myFileMonitor.log 文件。我想根据按日期分组并将数据保存在 csv 文件中,并匹配诸如 - 'Created', 'modified', 'moved', 'deleted' 之类的词。因此,在日志文件中,我想根据日期过滤数据,并且我想计算这些单词在日志中出现的次数。请协助。
myFileMonitor.log:
2020-09-25 16:31:58 - Security Alert! ' C:/Users/khond/Downloads/New folder ' has been Created!!
2020-09-25 16:32:11 - Security Alert! Files/Folder moved ' C:/Users/khond/Downloads/New folder ' to ' C:/Users/khond/Downloads/Test1 '
2020-09-25 16:32:12 - Security Alert! ' C:/Users/khond/Downloads/Test1 - Copy ' has been Created!!
2020-09-25 16:32:13 - Security Alert! ' C:/Users/khond/Downloads/Test1 - Copy ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/Code.png ' has been Created!!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/Code.png ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/Code.png ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/Code.png ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/Code.png ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/PsedoCode.png ' has been Created!!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/PsedoCode.png ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/PsedoCode.png ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/PsedoCode.png ' has been modified!
2020-09-25 16:32:30 - Security Alert! ' C:/Users/khond/Downloads/PsedoCode.png ' has been modified!
2020-09-25 16:33:56 - Security Alert! Files/Folder deleted: C:/Users/khond/Downloads/Test1!
2020-09-25 16:34:04 - Security Alert! Files/Folder moved ' C:/Users/khond/Downloads/Test1 - Copy ' to ' C:/Users/khond/Downloads/Test1 '
2020-09-25 16:34:05 - Security Alert! Files/Folder deleted: C:/Users/khond/Downloads/Code.png!
2020-09-25 16:34:11 - Security Alert! Files/Folder moved ' C:/Users/khond/Downloads/PsedoCode.png ' to ' C:/Users/khond/Downloads/Code.png '
2020-09-25 16:34:11 - Security Alert! ' C:/Users/khond/Downloads/Code.png ' has been modified!
2020-09-30 19:02:45 - Security Alert! ' C:/Users/khond/Downloads/New folder ' has been Created!!
2020-09-30 19:02:52 - Security Alert! ' C:/Users/khond/Downloads/New folder - Copy ' has been Created!!
2020-09-30 19:02:53 - Security Alert! ' C:/Users/khond/Downloads/New folder - Copy ' has been modified!
2020-09-30 19:02:53 - Security Alert! ' C:/Users/khond/Downloads/New folder - Copy (2) ' has been Created!!
2020-09-30 19:02:54 - Security Alert! ' C:/Users/khond/Downloads/New folder - Copy (2) ' has been modified!
2020-09-30 19:02:55 - Security Alert! Files/folder deleted: C:/Users/khond/Downloads/New folder - Copy (2)!
2020-09-30 19:03:07 - Security Alert! ' C:/Users/khond/Downloads/New Rich Text Document.rtf ' has been Created!!
2020-09-30 19:03:07 - Security Alert! ' C:/Users/khond/Downloads/New Rich Text Document.rtf ' has been modified!
2020-09-30 19:03:13 - Security Alert! Files/folder deleted: C:/Users/khond/Downloads/New Rich Text Document.rtf!
2020-09-30 19:03:16 - Security Alert! Files/folder deleted: C:/Users/khond/Downloads/New folder - Copy!
I want the output as below:
2020-09-25,Created,4
2020-09-25,modified,9
2020-09-25,deleted,2
2020-09-25,moved,3
2020-09-30,Created,4
2020-09-30,modified,3
2020-09-30,deleted,3
我是python新手,尝试编写如下函数
def collect_data():
try:
file_name = 'myFileMonitor.log'
with open(file_name) as f:
contents = f.read()
count_for_deleted = contents.count("deleted")
count_for_created = contents.count("created")
count_for_modified = contents.count("modified")
count_for_moved = contents.count("moved")
print(count_for_deleted)
print(count_for_created)
print(count_for_modified)
print(count_for_moved)
occurrences = defaultdict(lambda: defaultdict(int))
with open('myFileMonitor.log', 'r') as f:
for line in f.readlines():
date = line.split(' ')[0]
name = line.split(' - ')[1].split(': ')[0]
occurrences[date][name] += 1
for elem in occurrences:
print(elem[0], ' :: ', elem[1])
#print(occurrences)
except FileNotFoundError:
print("Exception error: File not found!")
解决方案
使用 defaultdict,您在正确的轨道上非常正确,但是有一种简单的方法可以检查键是否在一行中,那就是in
关键字。如果您事先知道关键字是什么,那么这是我将使用的代码:
from collections import defaultdict
def collect_data():
try:
occurrences = defaultdict(lambda: defaultdict(int))
keys = {'Created', 'modified', 'deleted', 'moved'}
with open('myFileMonitor.log', 'r') as f:
for line in f:
date = line.split(' ')[0]
for key in keys:
if key in line:
occurrences[date][key] += 1
for date in occurrences:
for key in occurrences[date]:
print(date+','+key+','+str(occurrences[date][key]))
except FileNotFoundError:
print("Exception error: File not found!")
输出:
2020-09-25,Created,4
2020-09-25,moved,3
2020-09-25,modified,10
2020-09-25,deleted,2
2020-09-30,Created,4
2020-09-30,modified,3
2020-09-30,deleted,3
您还可以执行一些操作,例如定义要打印日期和键的顺序,或者在必要时在循环之前进行排序。
推荐阅读
- android - GoogleFit DataReadResult 总是超时
- javascript - Ajax XHR 有时会返回为空/未定义?我无法复制
- python - Python 调用(结构)sys.path 操作
- django - 我正在尝试从 django 数据库中删除用户,但 /admin/auth/user/ 出现 IntegrityError 错误
- java - 将 Part 对象添加到 observableArrayList
Java 中 Product 类中的 associatedParts - python - 计算python中省略号列表的有向循环数
- spring - 为什么 Flux.range 和 Flux.range().flatmap(Flux.range()) 之间的性能差异?
- c# - 带有 C#、WhatsApp 和 Azure 功能的 Daily Dog 未运行
- python - 在 Xn = Yn + 1 上加入数据框
- oracle - 使用 CMD 获取 Oracle 版本