regex - Powershell 复杂的正则表达式 powershell 多组
问题描述
我需要一些关于我的正则表达式的帮助。
我的代码看起来像这样(我还没有走多远):
$source_file = "\\server\minified.txt"
$sf_content = gc $source_file -raw
$sections = $sf_content | select-string -AllMatches '(?smi)(^\s+\d+:\d+\s+AM\s+\w+\s+ACCOUNT ACTIVITY\s-\s)(\w+\s+\w+$)(.+?(Start Account\s\d+)(.+?Elapsed))'
$sections
该文件如下所示:
我能够使用我的正则表达式从上图顶部红色圈出的“帐户活动 - 人名”字符串中获取名字和姓氏。
我的最终目标是能够将蓝色框作为匹配项进行正则表达式,从左上角的日期获取所有信息,直到“每小时工作 1 个帐户”。然后我想从第二个红色圆圈中获取信息。我想在该行的开头获取开始时间,然后找到同一行“开始帐户 54321234”的最后一个实例,以便我可以将最后一次减去第一次。
因此,对于每个蓝色框,从红色圆圈中获取信息。对于每个包含“开始帐户”的红色圆圈,取蓝色圆圈减去绿色圆圈。
我想尝试使用正则表达式组。如果我想不通,我想将我的每个蓝框正则表达式放入一个数组中,对于数组中的每个项目,我可以进一步做正则表达式来得到我想要的。
我的代码不完整。但我不确定如何执行正则表达式,所以我会在更新脚本并进行自己的研究时不断更新它。
如果有人有指点,我将不胜感激。
以下是文本形式的源内容:
05/07/20 Acme, Inc. PAGE 1
9:48 AM ABC ACCOUNT ACTIVITY - Bart Simpson
The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
DATE TIME ELAPSED ACTION
04/16/20 8:06:50 0:00 Enter Account Screen
-------------------------------------------------------------------------------
8:06:53 0:03 Start account 12345678 ROSS, BOB N
8:07:24 0:31 Finished account in 31 seconds
-------------------------------------------------------------------------------
8:07:26 0:02 Start account 54321234 DOE, JOHN
8:07:27 0:01 Finished account in 1 seconds
-------------------------------------------------------------------------------
8:07:28 0:02 Start account 54321234 DOE, JOHN
8:10:26 0:01 Finished account in 1 seconds
-------------------------------------------------------------------------------
05/06/20 4:55:49 5:08 Leave Account Screen 9:33 Elapsed
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
05/06/20 4:55:55 0:06 Leave Account Screen
-------------------------------------------------------------------------------
DAILY TOTALS
5:33:46 - Time on Account screen for the day.
3 Calls 1 Calls per hour
3 Contacts 1 Contacts per hour
3 Accounts worked 1 Accounts worked per hour
05/07/20 Acme, Inc. PAGE 1
9:48 AM ABC ACCOUNT ACTIVITY - Lisa Simpson
The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
DATE TIME ELAPSED ACTION
04/16/20 8:06:50 0:00 Enter Account Screen
-------------------------------------------------------------------------------
8:06:53 0:03 Start account 6543212 DOE, JANE
8:07:24 0:31 Finished account in 31 seconds
-------------------------------------------------------------------------------
8:07:26 0:02 Start account 88888888 DEER, JOHN
8:07:27 1:01 Finished account in 1 seconds
-------------------------------------------------------------------------------
05/06/20 4:55:49 5:08 Leave Account Screen 10:33 Elapsed
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
05/06/20 4:55:55 0:06 Leave Account Screen
-------------------------------------------------------------------------------
DAILY TOTALS
5:33:46 - Time on Account screen for the day.
3 Calls 1 Calls per hour
3 Contacts 1 Contacts per hour
3 Accounts worked 1 Accounts worked per hour
解决方案
您将与正则表达式作斗争。它似乎在重复第二个捕获组。我尝试了一段时间,为您的相关匹配添加标签,而我只是使用这个正则表达式来挑选第一个匹配项。任何“正则表达式之王”的人,请移开视线。
(?smi)(^\s+\d+:\d+\s+(AM|PM)\s+\w+\s+ACCOUNT ACTIVITY\s-\s)(?<name>\w+\s+\w+$)(.+?(?<begin>\d+:\d+:\d+)(\s+\d:\d+\s+)(?<acctnumber>Start Account\s\d+)(\s+)(?<account>\w+,\s\w+(\s[A-za-z]|))\s+(?<end>.+?\d:\d+))
您可以提供一个模板来挑选所有可能感兴趣和使用的领域ConvertFrom-String
。关键是在大括号中唯一地标记您想要的所有项目。然后,您必须用星号标记模板中的第一项,因此使用上面的示例,您将拥有类似的内容。
$template = @"
05/07/20 Acme, Inc. PAGE 1
9:48 AM ABC ACCOUNT ACTIVITY - {customer*:Bart Simpson}
The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
DATE TIME ELAPSED ACTION
04/16/20 8:06:50 0:00 Enter Account Screen
-------------------------------------------------------------------------------
{begin1:8:06:53} 0:03 {accNum1:Start account 12345678} {name1:ROSS, BOB N}
{end1:8:07:24} 0:31 Finished account in 31 seconds
-------------------------------------------------------------------------------
{begin2:8:07:26} 0:02 {accNum2:Start account 54321234} {name2:DOE, JOHN}
{end2:8:07:27} 0:01 Finished account in 1 seconds
-------------------------------------------------------------------------------
{begin3:8:07:28} 0:02 {accNum3:Start account 54321234} {name3:DOE, JOHN}
{end3:8:10:26} 0:01 Finished account in 1 seconds
-------------------------------------------------------------------------------
05/06/20 4:55:49 5:08 Leave Account Screen 9:33 Elapsed
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
05/06/20 4:55:55 0:06 Leave Account Screen
-------------------------------------------------------------------------------
DAILY TOTALS
5:33:46 - Time on Account screen for the day.
3 Calls 1 Calls per hour
3 Contacts 1 Contacts per hour
3 Accounts worked 1 Accounts worked per hour
05/07/20 Acme, Inc. PAGE 1
9:48 AM ABC ACCOUNT ACTIVITY - {customer*:Lisa Simpson}
The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
DATE TIME ELAPSED ACTION
04/16/20 8:06:50 0:00 Enter Account Screen
-------------------------------------------------------------------------------
{begin1:8:06:53} 0:03 {accNum1:Start account 6543212} {name1:DOE, JANE}
{end1:8:07:24} 0:31 Finished account in 31 seconds
-------------------------------------------------------------------------------
{begin2:8:07:26} 0:02 {accNum2:Start account 88888888} {name2:DEER, JOHN}
{end2:8:07:27} 1:01 Finished account in 1 seconds
-------------------------------------------------------------------------------
{begin3:\s} 0:02 {accNum3:\s} {name3:\s}
{end3:\s} 1:01 Finished account in 1 seconds
-------------------------------------------------------------------------------
05/06/20 4:55:49 5:08 Leave Account Screen 10:33 Elapsed
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
05/06/20 4:55:55 0:06 Leave Account Screen
-------------------------------------------------------------------------------
DAILY TOTALS
5:33:46 - Time on Account screen for the day.
3 Calls 1 Calls per hour
3 Contacts 1 Contacts per hour
3 Accounts worked 1 Accounts worked per hour
"@
在您的最后一个示例中,我添加了第三组,其中包含正则表达式空间,因此它不会重复第三组中的第二组数据。
然后,您可以使用该参数通过 cmdlet 通过管道传输您的完整输入-TemplateContent
以应用您的模板。你应该把数据从另一边拿出来。
$data = # Get your data
$data | ConvertFrom-String -TemplateContent $template
customer : Bart Simpson
begin1 : 8:06:53
accNum1 : Start account 12345678
name1 : ROSS, BOB N
end1 : 8:07:24
begin2 : 8:07:26
accNum2 : Start account 54321234
name2 : DOE, JOHN
end2 : 8:07:27
begin3 : 8:07:28
accNum3 : Start account 54321234
name3 : DOE, JOHN
end3 : 8:10:26
customer : Lisa Simpson
begin1 : 8:06:53
accNum1 : Start account 6543212
name1 : DOE, JANE
end1 : 8:07:24
begin2 : 8:07:26
accNum2 : Start account 88888888
name2 : DEER, JOHN
end2 : 8:07:27
然后,您可以比较您的数据,循环输出对象。
推荐阅读
- sql-server - 如何替换 SQL Server 中逗号分隔的字符串列中的值
- android - 我的应用无法根据自动旋转正确显示
- sharepoint - Teams/Sharepoint 中的 Wiki
- asp.net - 403 仅在某些网络中禁止
- ubuntu - 在 Intellij IDEA 的终端中启动 javafx 应用程序
- php - PHP MySQl/PDO 每天更新数百万条数据
- angular - 在angular2中使用自动完成选择相同语言时如何抛出错误消息
- php - 使用 symfony 进程/exec/shell_exec 启动 gcloud compute ssh 时出错
- javascript - 将属性添加到通过 props 传递到另一个组件的反应组件
- wpf - 带有 TextBoxes 性能问题的 ItemsControl 的 WPF- ListView