首页 > 解决方案 > 如何使用 .replace 删除字符串的一部分(如果该部分存在于数组中)?

问题描述

我想从字符串中删除几个单词(这将在 for 循环中):

我需要删除的大部分单词是(这是我尝试过的正则表达式):

\b([[:<:]][0-9a-zA-z][[:>:]]|^'|about|after|all|also|[an]|and|another|any|are|[as]|at|[be]|because|been|before|being|\bbetween|both|but|by|came|can|come|could|did|do|each|for|from|get|got|had|[has]|have|he|her|here|him|himself|his|how|if|in|into|is|it|like|make|many|me|might|more|most|much|must|my|never|now|of|on|only|or|other|our|out|over|said|same|see|should|since|some|still|such|take|than|that|the|their|them|then|there|these|they|this|those|through|to|too|under|up|very|was|way|we|well|were|what|where|which|while|who|with|would|you|your)

如您所见,我需要删除 az、AZ、0-9 和几个单词

作为一个例子,我有这个短语:

“这是 Stackoverflow 的数据及其进入许多站点”

我的预期结果是:

“这是 Stackoverflow 的数据及其众多站点”

我试过的是这样的:

   let wordsHidden=["[about]","[after]","[all]","[also]","[an]","[and]","[another]","[any]","[are]","[as]","[at]","[be]","[because]","[been]","[before]","[being]","[between]","[both]","[but]","[by]","[came]","[can]","[come]","[could]","[did]","[do]","[each]","[for]","[from]","[get]","[got]","[had]","[has]","[have]","[he]","[her]","[here]","[him]","[himself]","[his]","[how]","[if]","[in]","[into]","[is]","[it]","[like]","[make]","[many]","[me]","[might]","[more]","[most]","[much]","[must]","[my]","[never]","[now]","[of]","[on]","[only]","[or]","[other]","[our]","[out]","[over]","[said]","[same]","[see]","[should]","[since]","[some]","[still]","[such]","[take]","[than]","[that]","[the]","[their]","[them]","[then]","[there]","[these]","[they]","[this]","[those]","[through]","[to]","[too]","[under]","[up]","[very]","[was]","[way]","[we]","[well]","[were]","[what]","[where]","[which]","[while]","[who]","[with]","[would]","[you]","[your]"];

   let test = wordsHidden.join("|");

  let regexorg = "/\b([[:<:]][0-9a-zA-z][[:>:]]|^'|"+test+")";
  var regex = new RegExp("/"+wordsHidden.join("|")+"/", 'g');

  let string = "DLs between data";
  console.log(string.replace(regex,''));

这是动作的正则表达式 在此处输入图像描述

有没有办法将数组的每个部分视为一个完整的单词并返回整个处理后的单词?

标签: javascriptregex

解决方案


I'm not sure what you're trying to do with the start of your rex, but I have figured out a way to delete specific strings (wrapped with a non-word character) from a string.

If you JUST match the exact strings you will be left with extra spaces, so my approach is to match a non-word character on either side of each word, matching each continuing word it finds that is in the list. If we DON'T chain words like this we won't catch adjacent words (since each one will try to match the non-word characters around itself and those will collide, and we will miss adjacent matches)

wordsHidden=["about","after","all","also","an","and","another","any","are","as","at","be","because","been","before","being","between","both","but","by","came","can","come","could","did","do","each","for","from","get","got","had","has","have","he","her","here","him","himself","his","how","if","in","into","is","it","like","make","many","me","might","more","most","much","must","my","never","now","of","on","only","or","other","our","out","over","said","same","see","should","since","some","still","such","take","than","that","the","their","them","then","there","these","they","this","those","through","to","too","under","up","very","was","way","we","well","were","what","where","which","while","who","with","would","you","your"];
rexString = "\\W((" + wordsHidden.join("\\W)|(") + "\\W))+";
console.log(rexString);
regex = new RegExp(rexString, 'g');

string = "This is the Stackoverflow's Data and its into many your your you your about you sites";
match = regex.exec(string);
matches = [];
while (match != null) {
  match.lastIndex = regex.lastIndex;
  matches.push(match);
  match = regex.exec(string);
}

cutString = string;
// iterate through matches backwards from end of string to start,
// so we don't shift our indexes as we delete parts of the string)
for (i = matches.length - 1; i >= 0; i--) {
  match = matches[i];
  beforeMatch = cutString.substr(0, match.lastIndex - match[0].length);
  afterMatch = cutString.substr(match.lastIndex - 1); //leave the trailing "space", might be some other character
  console.log(beforeMatch); console.log(match[0]); console.log(afterMatch);
  cutString = beforeMatch + afterMatch;
}
console.log(cutString);
This goes from
"This is the Stackoverflow's Data and its into many your your you your about you sites" to
"This Stackoverflow's Data its sites"
with all the matching words stripped (is, the, and, into, many, your, you, about)

推荐阅读