首页 > 解决方案 > Notepad++ Regex: delete lines that are identical between start and defined string and clip after said string

问题描述

I am trying to remove lines that are identical from start until a defined string within the line and clip everything after that string.

Example lines:

http://waterfountain.common.com/12443
http://waterfountain.common.com/243
http://waterfountain.common.com/243
http://glass.common.com/clear
http://glass.common.com/clear
http://1room.common.com/closet/empty

In this case, I'd like to compare everything from linestart to "common.com", delete all duplicates and additionally clip everything after the "common.com" or the "/".

Desired endresult would look like: (with or without "/" at the end)

http://waterfountain.common.com/
http://glass.common.com/

I found partial solutions, but I don't know how to modify/combine them to my needs.

For example delete lines that are completely identical:

^(.*?)$\s+?^(?=.*^\1$)

Edit: I tried the solution of "The fourth bird" and while it does work for the case I mentioned, some testing showed that there are cases where it fails. (Cases which I forgot to mention.)

It is possible that a number appears after the initial "//", e.g.

http://2eyes.common.com/

It's also possible that there are letters after the third "/", e.g.

http://snow.common.com/first/

标签: regexnotepad++

解决方案


For your current example data, you might use a capturing group to capture right before the forward slash and a digit as the defined string.

Then match from that point until the end of the string and repeat the matching using a backreference to group 1.

^(https?://[^/\n]+)/.*(?:\R\1.*)*

Explanation

  • ^ Start of string
  • (https?://[^/\n]+) Group 1, capture http:// with optional s before first encountered forward slash
  • /.* Match forward slash followed by any char except a newline 0+ times
  • (?: Non capturing group
    • \R\1.* Match any unicode newline sequence, back reference to group 1 and the rest of the string
  • )* Close non capturing group and repeat 0+ times

In the replacement using the first capturing group $1.

Regex demo

Result

http://waterfountain.common.com
http://glass.common.com

推荐阅读