首页 > 解决方案 > AWK - 根据时间删除所有日志文件行,但最后一次出现

问题描述

我有一个从我们的 DHCP 服务器收集选项 82 数据的文件。这些文件包含在所有方面都相似的行,除了时间戳和它们来自的服务器。我需要删除所有“相关”行,除了基于时间的类似行的最后一次出现。

我的原始文件如下所示:

 Aug  1 16:23:05 serverA dhcpd: Service A OPTION-82 | IP =192.168.1.100 | MAC=70:73:cb:b3:3c:58 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Aug  1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Jul 27 16:37:46 serverA dhcpd: Service A OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
 Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f 
 Jul 27 16:37:46 serverB dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
 Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f

文本处理后,我需要实现这一点:

  Aug  1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
  Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
  Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f

到目前为止,这是我尝试过的一些事情,但这些似乎删除了某些行的所有实例,并且完成的文件缺少我们需要的数据。

 /bin/awk '!_[$9]++' rawfile
 /bin/awk 'NR == FNR {if (z[$9]) y[z[$9]]; z[$9] = FNR; next} !(FNR in y)' rawfile rawfile
 tac rawfile | awk '!seen[$9]++' | tac > finished_file

我绝不是awk的专家。我只是通过谷歌搜索找到并尝试了这些,所以我能得到的任何帮助将不胜感激。而且,我对其他文本处理工具持开放态度,而不仅仅是 awk。

标签: perlawksed

解决方案


根据评论中的讨论,输入文件实际上是按时间戳升序排列的,您希望在 IP 上进行匹配。

$ cat input.txt 
 Aug  1 16:23:05 serverA dhcpd: Service A OPTION-82 | IP =192.168.1.100 | MAC=70:73:cb:b3:3c:58 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Aug  1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Jul 27 16:37:46 serverA dhcpd: Service A OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
 Jul 27 16:37:46 serverB dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
 Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f 
 Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
$ perl -ne '/\bIP\s*=\s*([\d.]+)\b/||next;$x{$1}=$_}{print $x{$_} for sort keys %x' input.txt 
 Aug  1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f 
 Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f

注意:sort keys %x并不完美,因为它会按字母顺序对行进行排序。如果您需要与原始文件中相同的顺序,请指定,正如我在评论中所说,显示更具代表性的输入(和输出)数据样本。另请参阅最小、完整和可验证的示例


推荐阅读