首页 > 解决方案 > Is there an easy way to distinguish Lastnames and Firstnames using RegEx in Notepad++

问题描述

I have 20,000+ records to deal with, but multiple passes like below is fine, unless of course all of it can be done in one super-effficient regex??

Sample records:

ABBEY Chantelle - 08.11.1995 - A

ANAND Toni-Grace - 04.09.1999 - A

ADCOCK ALVEY James - 12.04.1992 - C

ADLINGTON-JONES Robin Jacob Sebastian - 15.02.1999 - B

AFZAL Kiera - 25.04.2000 - B

AHMED Nisar Abu Ben Adhem - 16.08.2002 - C

AIRE-DEANE Christopher-James - 06.01.1997 - B

AL-MISRI Yaqoob - 23.07.2004 - C

ASTER Lily-May - 01.04.2010 - B

McQUEEN Stephen - 02.02.2001 - A

Desired output:

ABBEY¬Chantelle¬08.11.1995¬A

ANAND¬Toni-Grace¬04.09.1999¬A

ADCOCK ALVEY¬James¬12.04.1992¬C

ADLINGTON-JONES¬Robin¬Jacob¬Sebastian¬15.02.1999¬B

AFZAL¬Kiera¬25.04.2000¬B

AHMED¬Nisar¬Abu¬Adhem¬16.08.2002¬C

AIRE-DEANE¬Christopher-James¬06.01.1997¬B

AL-MISRI¬Yaqoob¬23.07.2004¬C

ASTER¬Lily-May¬01.04.2010¬B

McQUEEN Stephen¬02.02.2001¬A

First Pass:

Second Pass:

Third Pass:

Fourth Pass:

But the above Regexes can't account for these records

ADCOCK ALVEY James - 12.04.1992 - C

ADLINGTON-JONES Robin Jacob Sebastian - 15.02.1999 - B

AHMED Nisar Abu Ben Adhem - 16.08.2002 - C

Notes:

All Last names appear first [IN CAPITALS] some may be hyphenated, First- (second- and other middle-) names are next in Title Case and MAY be hyphenated too

Match Case is Enabled in Notepad++ during the Search and Replace activity. None of the Names have an apostrophe (e.g. O'KEEFE), they have all been removed

Even if just the Names can be sorted, I can deal with the Dates and Suffixes separately, any help would be greatly appreciated as I'm still a novice to RegEx

I also apologies in advance if I have missed an existing solution, just in case I didn't select the correct tags or terminology during my searches on this site

I've checked this article; however, it didn't help to resolve my query: Regular expression for first and last name

标签: regexnotepad++

解决方案


Matching names is not so easy due to all the possibilities, but for the given example data you might use a pattern with \G to select the spaces and - parts in between replacing them with ¬

Use (?-i) or tick the Match case checkmark.

(?-i)(?:^(?:Mc)?[A-Z]+(?:[ -][A-Z]+)*|\G(?!^)[A-Z][a-z]+(?:-[A-Z][a-z]+)*|\d{2}\.\d{2}\.\d{4})\K -?\h*

Regex demo

enter image description here


推荐阅读