首页 > 解决方案 > 正则表达式模式从 X 模式开始直到 X 模式

问题描述

我一直试图找出正则表达式,但一直失败。

我需要能够对从 5 位数字序列开始的文本文件进行分组,直到下一个 5 位数字序列

从下面的数据中,一个组将被视为以下内容:

000001  10_SEC_SLATE_-_ACT_1.NEW.02      V     C        01:00:00:00 01:00:08:00 00:59:50:00 00:59:58:00 
*FROM CLIP NAME:  10 SEC SLATE - ACT 1.NEW.02 
*SOURCE FILE: 10 SEC SLATE - ACT 1.NEW.02
TITLE:   Cities_of_the_Underworld_Ep_101_Lock_Cut_210512 
FCM: NON-DROP FRAME
000001  10_SEC_SLATE_-_ACT_1.NEW.02      V     C        01:00:00:00 01:00:08:00 00:59:50:00 00:59:58:00 
*FROM CLIP NAME:  10 SEC SLATE - ACT 1.NEW.02 
*SOURCE FILE: 10 SEC SLATE - ACT 1.NEW.02
000002  KARGA7_SLATE.MOV                 V     C        01:00:00:00 01:00:10:00 00:59:50:00 01:00:00:00 
*FROM CLIP NAME:  KARGA7_SLATE.NEW.01 
*SOURCE FILE: KARGA7_SLATE.MOV
000003  KARGA7_SLATE.MOV                 A     C        01:00:00:00 01:00:10:00 00:59:50:00 01:00:00:00 
*FROM CLIP NAME:  KARGA7_SLATE.NEW.01 
*SOURCE FILE: KARGA7_SLATE.MOV
000004  KARGA7_SLATE.MOV                 A2    C        01:00:00:00 01:00:10:00 00:59:50:00 01:00:00:00 
*FROM CLIP NAME:  KARGA7_SLATE.NEW.01 
*SOURCE FILE: KARGA7_SLATE.MOV
000005  B004_C009_12071C                 V     C        10:17:25:18 10:17:26:15 01:00:00:00 01:00:00:12 
M2      B004_C009_12071C                          045.1 10:17:25:18 
*FROM CLIP NAME:  LOS1_201207_B01009.NEW.01 
*SOURCE FILE: B004_C009_12071C

标签: javascriptregexalgorithmtextpattern-matching

解决方案


我们可以尝试使用match以下正则表达式模式:

\b\d{6}\b[\s\S]*?(?=\b\d{6}\b|$)

这将从一个起始的 6 位术语匹配,直到命中(但不包括)下一个此类术语或输入结束。

var input = `
TITLE:   Cities_of_the_Underworld_Ep_101_Lock_Cut_210512 
FCM: NON-DROP FRAME
000001  10_SEC_SLATE_-_ACT_1.NEW.02      V     C        01:00:00:00 01:00:08:00 00:59:50:00 00:59:58:00 
*FROM CLIP NAME:  10 SEC SLATE - ACT 1.NEW.02 
*SOURCE FILE: 10 SEC SLATE - ACT 1.NEW.02
000002  KARGA7_SLATE.MOV                 V     C        01:00:00:00 01:00:10:00 00:59:50:00 01:00:00:00 
*FROM CLIP NAME:  KARGA7_SLATE.NEW.01 
*SOURCE FILE: KARGA7_SLATE.MOV
000003  KARGA7_SLATE.MOV                 A     C        01:00:00:00 01:00:10:00 00:59:50:00 01:00:00:00 
*FROM CLIP NAME:  KARGA7_SLATE.NEW.01 
*SOURCE FILE: KARGA7_SLATE.MOV
000004  KARGA7_SLATE.MOV                 A2    C        01:00:00:00 01:00:10:00 00:59:50:00 01:00:00:00 
*FROM CLIP NAME:  KARGA7_SLATE.NEW.01 
*SOURCE FILE: KARGA7_SLATE.MOV
000005  B004_C009_12071C                 V     C        10:17:25:18 10:17:26:15 01:00:00:00 01:00:00:12 
M2      B004_C009_12071C                          045.1 10:17:25:18 
*FROM CLIP NAME:  LOS1_201207_B01009.NEW.01 
*SOURCE FILE: B004_C009_12071C
`;
items = input.match(/\b\d{6}\b[\s\S]*?(?=\b\d{6}\b|$)/g);
console.log(items);

请注意,我们[\s\S]*在 regex 模式中使用了 dot all 模式的替代品,以确保该模式可以匹配多行。


推荐阅读