首页 > 解决方案 > Regex - Delete everything before first match

问题描述

Really struggling with this one. I need a regular expression to remove the subject/to/from/date fields from an e-mail, but leaving all previous subject/to/from/date field entries within the mail chain. For example:

Subject: RE: Test mail
From: test@stackoverflow.com
To: test@test.com
Date: 22/06/2018 10:00:00

This is the body of e-mail #3.

Subject: RE: Test mail
From: test@test.com
To: test@stackoverflow.com
Date: 22/06/2018 09:55:00

This is the body of e-mail #2.

Subject: Test mail
From: test@stackoverflow.com
To: test@test.com
Date: 22/06/2018 09:50:00

This is the body of e-mail #1.

I'd like the regular expression to remove simply the top five lines to give:

This is the body of e-mail #3.

Subject: RE: Test mail
From: test@test.com
To: test@stackoverflow.com
Date: 22/06/2018 09:55:00

This is the body of e-mail #2.

Subject: Test mail
From: test@stackoverflow.com
To: test@test.com
Date: 22/06/2018 09:50:00

This is the body of e-mail #1.

Unfortunately, I can't write anything that specifically deletes the first five lines as there may also be a CC field; which means it could potentially be six lines.

It therefore needs to match the first instance of "Date:" until the end of the line and delete everything before. Any ideas would be hugely appreciated; the closest I've got is the below which unfortunately matches both instances of "Date:".

[\s\S]*.*Date:.*[\s\S]

标签: regex

解决方案


The regex should be constructed the following way:

  • Start from the start of string.
  • Accept any content up to a line starting from "Date: ".
  • Accept the rest of this line.
  • Accept any number of following \n chars (the end of this line and following empty lines).

No g (global) option, since you want to perform only a single match.

So one of possible solutions can be as follows:

/\A.+?^Date: [^\n]+\n+/ms

Details:

  • m option - multi-line (^ and $ match also start / end of line).
  • s option - single-line (. matches also \n).
  • \A - Start of the whole string.
  • .+? - Any number of any chars (due to s option, including \n).
  • ^ - Start of a line (due to m option).
  • Date: - Start of the "Date" line.
  • [^\n]+ - Any number of chars other than \n - the actual date field.
  • \n+ - The end of line and following empty lines.

As you specified neither the host language nor regex version, I assumed PCRE, supporting all the features used.


推荐阅读