perl - AWK - 根据时间删除所有日志文件行,但最后一次出现
问题描述
我有一个从我们的 DHCP 服务器收集选项 82 数据的文件。这些文件包含在所有方面都相似的行,除了时间戳和它们来自的服务器。我需要删除所有“相关”行,除了基于时间的类似行的最后一次出现。
我的原始文件如下所示:
Aug 1 16:23:05 serverA dhcpd: Service A OPTION-82 | IP =192.168.1.100 | MAC=70:73:cb:b3:3c:58 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
Aug 1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
Jul 27 16:37:46 serverA dhcpd: Service A OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
Jul 27 16:37:46 serverB dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
文本处理后,我需要实现这一点:
Aug 1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
到目前为止,这是我尝试过的一些事情,但这些似乎删除了某些行的所有实例,并且完成的文件缺少我们需要的数据。
/bin/awk '!_[$9]++' rawfile
/bin/awk 'NR == FNR {if (z[$9]) y[z[$9]]; z[$9] = FNR; next} !(FNR in y)' rawfile rawfile
tac rawfile | awk '!seen[$9]++' | tac > finished_file
我绝不是awk的专家。我只是通过谷歌搜索找到并尝试了这些,所以我能得到的任何帮助将不胜感激。而且,我对其他文本处理工具持开放态度,而不仅仅是 awk。
解决方案
根据评论中的讨论,输入文件实际上是按时间戳升序排列的,您希望在 IP 上进行匹配。
$ cat input.txt
Aug 1 16:23:05 serverA dhcpd: Service A OPTION-82 | IP =192.168.1.100 | MAC=70:73:cb:b3:3c:58 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
Aug 1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
Jul 27 16:37:46 serverA dhcpd: Service A OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
Jul 27 16:37:46 serverB dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
$ perl -ne '/\bIP\s*=\s*([\d.]+)\b/||next;$x{$1}=$_}{print $x{$_} for sort keys %x' input.txt
Aug 1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
注意:sort keys %x
并不完美,因为它会按字母顺序对行进行排序。如果您需要与原始文件中相同的顺序,请指定,正如我在评论中所说,显示更具代表性的输入(和输出)数据样本。另请参阅最小、完整和可验证的示例。
推荐阅读
- ios - Cocoapods 似乎安装了错误的 pod 版本
- javascript - GTM - 自定义 JS - 变量的返回值,除非它为空
- python - 我正在使用 python 3.7.7,但在导入 tensorflow 时遇到问题
- javascript - 如何选择/使用元素的子元素。Javascript
- rust - 如果可能,实现复制的类型是否会被移动?
- javascript - 在javascript中关闭弹出窗口时将子网址发送给父母
- python - 在熊猫中导入文本文件时跳过给定列的单词中的空格
- wordpress - 以编程方式更新 ACF 选择字段数据
- android-studio - 无法检测 xml 中的对象 - Kotlin
- dart - ponnamkarthik/toast/fluttertoast/MethodCallHandlerImpl.kt 编译颤振时