首页 > 解决方案 > Powershell 复杂的正则表达式 powershell 多组

问题描述

我需要一些关于我的正则表达式的帮助。

我的代码看起来像这样(我还没有走多远):

$source_file = "\\server\minified.txt"
$sf_content = gc $source_file -raw

$sections = $sf_content | select-string -AllMatches '(?smi)(^\s+\d+:\d+\s+AM\s+\w+\s+ACCOUNT ACTIVITY\s-\s)(\w+\s+\w+$)(.+?(Start Account\s\d+)(.+?Elapsed))'
$sections

该文件如下所示:

源内容示例 我能够使用我的正则表达式从上图顶部红色圈出的“帐户活动 - 人名”字符串中获取名字和姓氏。

我的最终目标是能够将蓝色框作为匹配项进行正则表达式,从左上角的日期获取所有信息,直到“每小时工作 1 个帐户”。然后我想从第二个红色圆圈中获取信息。我想在该行的开头获取开始时间,然后找到同一行“开始帐户 54321234”的最后一个实例,以便我可以将最后一次减去第一次。

因此,对于每个蓝色框,从红色圆圈中获取信息。对于每个包含“开始帐户”的红色圆圈,取蓝色圆圈减去绿色圆圈。

我想尝试使用正则表达式组。如果我想不通,我想将我的每个蓝框正则表达式放入一个数组中,对于数组中的每个项目,我可以进一步做正则表达式来得到我想要的。

我的代码不完整。但我不确定如何执行正则表达式,所以我会在更新脚本并进行自己的研究时不断更新它。

如果有人有指点,我将不胜感激。

以下是文本形式的源内容:

   05/07/20                                                       Acme, Inc.                                                          PAGE 1
    9:48 AM  ABC                                          ACCOUNT ACTIVITY - Bart Simpson

The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
      DATE     TIME     ELAPSED   ACTION


    04/16/20  8:06:50      0:00   Enter Account Screen
-------------------------------------------------------------------------------
              8:06:53      0:03   Start account 12345678  ROSS, BOB N
              8:07:24      0:31   Finished account in 31 seconds
-------------------------------------------------------------------------------
              8:07:26      0:02   Start account 54321234  DOE, JOHN
              8:07:27      0:01   Finished account in 1 seconds
-------------------------------------------------------------------------------
              8:07:28      0:02   Start account 54321234  DOE, JOHN
              8:10:26      0:01   Finished account in 1 seconds
-------------------------------------------------------------------------------
    05/06/20  4:55:49      5:08   Leave Account Screen     9:33 Elapsed 
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
    05/06/20  4:55:55      0:06   Leave Account Screen
-------------------------------------------------------------------------------

                                      DAILY TOTALS
                        5:33:46 - Time on Account screen for the day.
                              3 Calls             1 Calls per hour
                              3 Contacts          1 Contacts per hour
                              3 Accounts worked   1 Accounts worked per hour
   05/07/20                                                       Acme, Inc.                                                          PAGE 1
    9:48 AM  ABC                                          ACCOUNT ACTIVITY - Lisa Simpson

The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
      DATE     TIME     ELAPSED   ACTION


    04/16/20  8:06:50      0:00   Enter Account Screen
-------------------------------------------------------------------------------
              8:06:53      0:03   Start account 6543212  DOE, JANE
              8:07:24      0:31   Finished account in 31 seconds
-------------------------------------------------------------------------------
              8:07:26      0:02   Start account 88888888  DEER, JOHN
              8:07:27      1:01   Finished account in 1 seconds
-------------------------------------------------------------------------------
    05/06/20  4:55:49      5:08   Leave Account Screen    10:33 Elapsed 
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
    05/06/20  4:55:55      0:06   Leave Account Screen
-------------------------------------------------------------------------------

                                      DAILY TOTALS
                        5:33:46 - Time on Account screen for the day.
                              3 Calls             1 Calls per hour
                              3 Contacts          1 Contacts per hour
                              3 Accounts worked   1 Accounts worked per hour

标签: regexpowershell

解决方案


您将与正则表达式作斗争。它似乎在重复第二个捕获组。我尝试了一段时间,为您的相关匹配添加标签,而我只是使用这个正则表达式来挑选第一个匹配项。任何“正则表达式之王”的人,请移开视线。

(?smi)(^\s+\d+:\d+\s+(AM|PM)\s+\w+\s+ACCOUNT ACTIVITY\s-\s)(?<name>\w+\s+\w+$)(.+?(?<begin>\d+:\d+:\d+)(\s+\d:\d+\s+)(?<acctnumber>Start Account\s\d+)(\s+)(?<account>\w+,\s\w+(\s[A-za-z]|))\s+(?<end>.+?\d:\d+))

您可以提供一个模板来挑选所有可能感兴趣和使用的领域ConvertFrom-String。关键是在大括号中唯一地标记您想要的所有项目。然后,您必须用星号标记模板中的第一项,因此使用上面的示例,您将拥有类似的内容。

$template = @"
   05/07/20                                                       Acme, Inc.                                                          PAGE 1
    9:48 AM  ABC                                          ACCOUNT ACTIVITY - {customer*:Bart Simpson}

The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
      DATE     TIME     ELAPSED   ACTION


    04/16/20  8:06:50      0:00   Enter Account Screen
-------------------------------------------------------------------------------
              {begin1:8:06:53}      0:03   {accNum1:Start account 12345678}  {name1:ROSS, BOB N}
              {end1:8:07:24}      0:31   Finished account in 31 seconds
-------------------------------------------------------------------------------
              {begin2:8:07:26}      0:02   {accNum2:Start account 54321234}  {name2:DOE, JOHN}
              {end2:8:07:27}      0:01   Finished account in 1 seconds
-------------------------------------------------------------------------------
              {begin3:8:07:28}      0:02   {accNum3:Start account 54321234}  {name3:DOE, JOHN}
              {end3:8:10:26}      0:01   Finished account in 1 seconds
-------------------------------------------------------------------------------
    05/06/20  4:55:49      5:08   Leave Account Screen     9:33 Elapsed 
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
    05/06/20  4:55:55      0:06   Leave Account Screen
-------------------------------------------------------------------------------

                                      DAILY TOTALS
                        5:33:46 - Time on Account screen for the day.
                              3 Calls             1 Calls per hour
                              3 Contacts          1 Contacts per hour
                              3 Accounts worked   1 Accounts worked per hour
   05/07/20                                                       Acme, Inc.                                                          PAGE 1
    9:48 AM  ABC                                          ACCOUNT ACTIVITY - {customer*:Lisa Simpson}

The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
      DATE     TIME     ELAPSED   ACTION


    04/16/20  8:06:50      0:00   Enter Account Screen
-------------------------------------------------------------------------------
              {begin1:8:06:53}      0:03   {accNum1:Start account 6543212}  {name1:DOE, JANE}
              {end1:8:07:24}      0:31   Finished account in 31 seconds
-------------------------------------------------------------------------------
              {begin2:8:07:26}      0:02   {accNum2:Start account 88888888}  {name2:DEER, JOHN}
              {end2:8:07:27}      1:01   Finished account in 1 seconds
-------------------------------------------------------------------------------
              {begin3:\s}      0:02   {accNum3:\s}  {name3:\s}
              {end3:\s}      1:01   Finished account in 1 seconds
-------------------------------------------------------------------------------
    05/06/20  4:55:49      5:08   Leave Account Screen    10:33 Elapsed 
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
    05/06/20  4:55:55      0:06   Leave Account Screen
-------------------------------------------------------------------------------

                                      DAILY TOTALS
                        5:33:46 - Time on Account screen for the day.
                              3 Calls             1 Calls per hour
                              3 Contacts          1 Contacts per hour
                              3 Accounts worked   1 Accounts worked per hour
"@

在您的最后一个示例中,我添加了第三组,其中包含正则表达式空间,因此它不会重复第三组中的第二组数据。

然后,您可以使用该参数通过 cmdlet 通过管道传输您的完整输入-TemplateContent以应用您的模板。你应该把数据从另一边拿出来。

$data = # Get your data
$data | ConvertFrom-String -TemplateContent $template

customer : Bart Simpson
begin1   : 8:06:53
accNum1  : Start account 12345678
name1    : ROSS, BOB N
end1     : 8:07:24
begin2   : 8:07:26
accNum2  : Start account 54321234
name2    : DOE, JOHN
end2     : 8:07:27
begin3   : 8:07:28
accNum3  : Start account 54321234
name3    : DOE, JOHN
end3     : 8:10:26

customer : Lisa Simpson
begin1   : 8:06:53
accNum1  : Start account 6543212
name1    : DOE, JANE
end1     : 8:07:24
begin2   : 8:07:26
accNum2  : Start account 88888888
name2    : DEER, JOHN
end2     : 8:07:27

然后,您可以比较您的数据,循环输出对象。


推荐阅读