首页 > 解决方案 > Extracting words from a space (comma) separated string

问题描述

I am trying to write a regex that extracts words separated by spaces (optionally comma + space), removing the 'stack' prefixes from the words (if any). I am trying to find a pure regex solution without any post-processing of results or similarly (if possible). Please see the attempt bellow:

Input:

var x = "stackoverflow aa bbb, ccc"

Regex:

var rx = /((?:\s)?(?:stack)?(\w+))+/

Expected output:

var match = x.match(rx);
["stackoverflow aa bbb ccc", "overflow", "aa", "bbb", "ccc"]

Actual output:

["stackoverflow aa bbb ccc", " ccc", "ccc"]

标签: javascriptregexregex-group

解决方案


One way to get the same aforementioned results from a match() output is using a positive lookbehind. But lookbehinds had not been existed in JavaScript until ECMA2018 and as I'm aware Google Chrome is the only browser that implemented this feature into their JavaScript engine (V8).

How this this achievable? We need two paths to match the words: one should match sub-strings that come after stack and the other should match all words but ensures that they do not start with stack:

/(?<=\bstack)\w+|\b(?!stack)\w+/

If spaces and commas are mandatory, take them into consideration:

/(?:(?<=\bstack)\w+|\b(?!stack)\w+)(?=[, ]|$)/

JS code:

var str = "stackoverflow aa bbb, ccc"
console.log(str.match(/(?:(?<=\bstack)\w+|\b(?!stack)\w+)(?=[, ]|$)/g))

Another way would be splitting on undesired parts but needs more clarification on the current requirement as it may contain more than just words right now:

var str = "stackoverflow aa bbb, ccc"
console.log(str.split(/\bstack|[, ]+/))


推荐阅读