regex - 使用 perl 搜索字符串并将搜索到的字符串保存在数组中
问题描述
我有一个包含数据的巨大文件,我试图在每一行中搜索一个字符串。并且只保存该搜索部分而不是数组中的整行。
这是我尝试过的代码
use warnings;
use Data::Dumper;
my $start_run = time();
while (<DATA>){
my $line=$_ ;
if ($line =~ m/Date/) {
my @result = grep (/Date/, $line);
print @result;
}
}
#####
my $end_run = time();
my $run_time = sprintf "%.2f", (($end_run - $start_run) / 60);
print "Elapsed: $run_time minutes\n";
__DATA__
ServerName: (DESCRIPTION=(CONNECT_TIMEOUT=60)(RETRY_COUNT=5)(ADDRESS=(PROTOCOL=TCP)(HOST=xbian.dbaas.ing.net)(PORT=121))(CONNECT_DATA=(SERVER=DEDI)(SERVICE_NAME=pmx0))) ServerType: Oracle DatabaseName: MX_FN_OWNER RDBMSAccess: NATIVE_OCI ConnectionName: Mx0_MUXFO_1_1 ConnectionNo: 1 Date: 2020-03-29 08:58:10
insert into MX_FN_OWNER.TRN_EDBF (TIMESTAMP,M_IDENTITY,M_REFERENCE,M_USER,M_GROUP,M_DESK,M_DATE_SYS,M_DATE_CMP,M_TIME_CMP,M_SDATE_CMP,M_STIME_CMP,M_COMMENT,M_ERROR,M_START_END,M_TIME_CPU,M_TIME_SYB,M_TIME_ELAP,M_SCRPT_NAME,M_UNIT_NAME,M_ERR_COUNT,M_NPID) values (0,TRN_EODA_DBFS.nextval,:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,:17,:18,:19) (Bulk_Copy begin, 19 columns, 1 Flush size)
==============================================
ServerName: (DESCRIPTION=(CONNECT_TIMEOUT=60)(RETRY_COUNT=5)(ADDRESS=(PROTOCOL=TCP)(HOST=xb305-scan.net)(PORT=121))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=pmx02fn))) ServerType: Oracle DatabaseName: MX_FN_OWNER RDBMSAccess: NATIVE_OCI ConnectionName: Mx0_MXFO_168991_1 ConnectionNo: 1 Date: 2020-03-29 09:21:10
Mux execution time: 00:00:00 3 ms
显然每一行都有日期,我只对日期和它的时间感兴趣,我可以减去两行之间的时间并保存它。但是当我尝试 grep 时,输出是整行。由于没有分隔符,我无法拆分行。
有没有办法让我获得与每一行关联的日期:2020-03-29 09:21:10
转换脚本
#!/usr/bin/perl
use strict;
use warnings;
use DateTime::Format::Strptime;
my $parser = DateTime::Format::Strptime->new(
pattern => 'd{4}-\d{2}-\d{2}\h+\d{2}:\d{2}:\d{2}',
on_error => 'croak',
);
my $dt = $parser->parse_datetime('2020-03-29 08:58:10');
print "$dt\n";
谢谢
解决方案
您可以匹配类似日期的模式并用于\K
重置匹配缓冲区。
请注意,该模式不会验证日期时间本身。
然后将整个匹配添加$&
到数组中。
\bDate:\h+\K\d{4}-\d{2}-\d{2}\h+\d{2}:\d{2}:\d{2}$
解释
\bDate:\h+\K
匹配Date:
和 1+ 个水平空白字符。\K
重置匹配缓冲区\d{4}-\d{2}-\d{2}\h+\d{2}:\d{2}:\d{2}
匹配类似模式的日期时间$
如果该值始终在末尾,则可以断言字符串的末尾
例如:
my @arr;
while (<DATA>){
my $line=$_ ;
if ($line =~ m/\bDate:\h+\K\d{4}-\d{2}-\d{2}\h+\d{2}:\d{2}:\d{2}$/) {
push(@arr, $&);
}
}
for my $i (0 .. $#arr) {
if (exists($arr[$i + 1])) {
my $currentDateTime = Time::Piece->strptime(
$arr[$i],
"%Y-%m-%d %H:%M:%S");
my $nextDateTime = Time::Piece->strptime(
$arr[$i + 1],
"%Y-%m-%d %H:%M:%S");
my $diff = $nextDateTime - $currentDateTime;
print($diff->minutes);
}
}
输出
23 minutes
您可以使用范围缩小日期模式(它仍然无法验证)
\bDate:\h+\K\d{4}-(?:1[0-2]|0?[1-9])-(?:3[01]|[12][0-9]|0?[1-9])\h+(?:2[0-3]|[01]?[0-9]):[0-5]?[0-9]:[0-5]?[0-9]$
推荐阅读
- kubernetes - K8S - Not able to see alerts via - alertmanager
- java - Empty editText causing app to crash when trying to calculate
- javascript - Envelop encryption large file
- android - 与 gradle 文件同步时无法解决 firebase-measurement-connector-impl
- php - 如何将一个人的用户ID存储到数据库中的另一张表中?
- ibm-integration-bus - 无法将共享库部署到 IIB 集成服务器
- python - I've created variables but none of them are working
- ruby-on-rails - How to use material design text fields in rails
- php - PHP 读取西里尔文的目录
- python - 如何将多个 pandas 列输入 keras 进行学习?