bash - 仅解码文本电子邮件文件的一部分以进行 bash 处理
问题描述
我正在扫描/home/vmail/
接收到的电子邮件文本文件的子目录,如果字符串匹配则删除它们。一个优化的脚本是由于这个答案。
my_new_del() {
find /home/vmail -type f -name '*.some.file.pattern*' -exec grep -i -H -l -s "$1" {} + |
xargs rm -f {}
}
它就像一个魅力,并删除与我传递的字符串匹配的文件。但是,我刚刚意识到有些文件的内容是 base64 编码的。这是一封垃圾邮件,内容是垃圾邮件,但看起来如下:
Return-Path: <Bartybeve@aznetwork.net>
X-Original-To: info@my_domain.com
Delivered-To: info@my_domain.com
Received: by some.qdmn.com (Postfix, from userid 5000)
id D47C87F8CB; Thu, 11 Oct 2018 04:21:11 -0400 (EDT)
X-Original-To: info@my_domain.com
Delivered-To: info@my_domain.com
Received: from vlan131-44.aznetwork.net (unknown [185.129.1.44])
by some.qdmn.com (Postfix) with ESMTP id 1F1077F8C9
for info@my_domain.com Thu, 11 Oct 2018 04:21:05 -0400 (EDT)
Received: from unknown (60.233.87.144)
by mmx09.tilkbans.com with ESMTP; Thu, 11 Oct 2018 00:16:37 -0700
Received: from unknown (124.156.103.124)
by mailout.endmonthnow.com with ASMTP; Thu, 11 Oct 2018 00:10:28 -0700
Message-ID: <7B6B9A4E.9D85F307@aznetwork.net>
Date: Thu, 11 Oct 2018 00:10:28 -0700
Reply-To: "Anja" <Bartybeve@aznetwork.net>
From: "Anja" <Bartybeve@aznetwork.net>
User-Agent: Opera/7.02 (Windows ME; U)
MIME-Version: 1.0
To: "Anja" <info@my_domain.com>
Subject: I could not resist and pass by!
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: base64
PCFkb2N0eXBlIGh0bWw+DQo8aHRtbD4NCjxoZWFkPg0KPG1ldGEgY2hhcnNldD0idXRmLTgiPg0K
PC9oZWFkPg0KDQo8Ym9keT4NCjxwPjx0YWJsZSB3aWR0aD0iMTMlIiBib3JkZXI9IjAiPjx0Ym9k
eT48dHI+PHRkPjwvdGQ+PHRkPjwvdGQ+PHRkPjwvdGQ+PHRkPjwvdGQ+PHRkPjwvdGQ+PC90cj48
L3Rib2R5PjwvdGFibGU+PC9wPg0KPHA+V2FudCBtZT8gd2FubmEgZnVjayBtZT8gT2hoaGguLi4u
IG9rLCBjb21lIHRvIG1lICkpIEhlcmUgbXkgZm90byBhbmQgYWRkcmVzcywgZmluZCBtZSA6KSA8
L3A+DQo8cD48dGFibGUgd2lkdGg9IjcyJSIgYm9yZGVyPSIwIj48dGJvZHk+PHRyPjx0ZD48L3Rk
PjwvdHI+PC90Ym9keT48L3RhYmxlPjwvcD4NCjxhICAgaHJlZj0iaHR0cDovL2xvdmVmb3J5b3Uu
c3UiIHRhcmdldD0iX2JsYW5rIiBzdHlsZT0iZm9udC13ZWlnaHQ6IG5vcm1hbDtsZXR0ZXItc3Bh
Y2luZzogbm9ybWFsO2xpbmUtaGVpZ2h0OiAxMDAlO3RleHQtZGVjb3JhdGlvbjogbm9uZTtjb2xv
cjogIzc3NzsiPmh0dHA6Ly9sb3ZlZm9yeW91LnN1PC9hPg0KPHA+PHRhYmxlIHdpZHRoPSIyNyUi
IGJvcmRlcj0iMCI+PHRib2R5Pjx0cj48dGQ+PC90ZD48dGQ+PC90ZD48dGQ+PC90ZD48dGQ+PC90
ZD48L3RyPjwvdGJvZHk+PC90YWJsZT48L3A+DQo8YSBocmVmPSJodHRwOi8vbG92ZWZvcnlvdS5z
dSI+PGltZyBzcmM9Imh0dHBzOi8vNzgubWVkaWEudHVtYmxyLmNvbS83ZTU3ZjBlMDUzZWNlYjA2
MGQwZDMyMzQ3NmQxZWI3MS90dW1ibHJfb3kycmd4TkRFYzF3MmtqZGRvMV80MDAuZ2lmIiBhbHQ9
ImNsaWNrIGhlcmUgYW5kIHNlZSBteSBwaG90byIgYm9yZGVyPSIwIiA+PC9hPg0KPHA+PHRhYmxl
IHdpZHRoPSI3NiUiIGJvcmRlcj0iMCI+PHRib2R5Pjx0cj48dGQ+PC90ZD48dGQ+PC90ZD48dGQ+
PC90ZD48dGQ+PC90ZD48dGQ+PC90ZD48L3RyPjwvdGJvZHk+PC90YWJsZT48L3A+DQo8YSBocmVm
PSJodHRwOi8vbG92ZWZvcnlvdS5zdSI+dW5zdWJzY3JpYmU8L2E+DQo8cD48dWw+PC91bD48L3A+
DQo8L2JvZHk+DQo8L2h0bWw+DQo=
因此,当我尝试使用别名 bash 命令查找内容与字符串匹配的文件时,不会标记上述电子邮件文件。
我知道我可以用它echo 'some-base64-encoded-text' | base64 --decode
来解码消息。一个网络解码工具确实告诉我解码的文本有垃圾邮件的一部分。
我想先 grep 进行Content-Transfer-Encoding: base64
匹配,然后找到Content-Transfer-Encoding: base64
字符串的索引,然后从那里解码消息,将其回显,然后 grep 进行匹配,如果找到匹配项则删除文件。
但是,有没有一种简单的方法可以即时完成?
解决方案
这是一些perl。它需要 MIME::Base64 ( cpan install MIME::Base64
)
#!perl
use strict;
use warnings;
use autodie;
use MIME::Base64;
$/ = "";
for my $file (@ARGV) {
open my $fh, "<", $file;
my @paragraphs = <$fh>;
close $fh;
my $header = shift @paragraphs;
my $content;
if ($header =~ /Content-Transfer-Encoding: base64/) {
$content = decode_base64($paragraphs[0]);
}
else {
$content = join "\n\n", @paragraphs;
}
if ($content =~ /$ENV{pattern}/) {
print "delete: $file\n";
## unlink $file; # uncomment to really delete the file
}
}
然后你可以这样做:
find ... -exec env pattern="$1" perl email_scanner.pl +
推荐阅读
- python - change() 缺少 1 个必需的位置参数:“X”,同时预测未来值
- python - 在 python 中使用 pca 和 svd 后如何重建原始数据?
- html - 为什么 ImageResizer 会增加加载时间?
- elixir - Ecto - 如何从一个表相对于另一个表异步更新一批记录
- python - 如何获取数据而不是
? - ansible - How to use ansible to test network connection?
- spring - 如何在spring boot中为elasticsearch编写单元测试
- r - 基于多列和多行条件展开 R 数据框
- c# - 如何返回方法?
- c# - 实现和泛化UML?