首页 > 解决方案 > 如何在 JavaScript 中为特殊条件设置正则表达式?

问题描述

我需要帮助来编写针对这些条件的正则表达式模式:

Hashtag 字符的限制

长度

我想出了这样的想法,但它不包括最后两个条件:

const subStr = postText.split(/(?=[\s:#,+/][a-zA-Z\d]+)(#+\w{2,})/gm);

const result = _.filter(subStr, word => word.startsWith('#')).map(hashTag => hashTag.substr(1)) || [];

编辑:

示例:如果我有:

const postText = "#hello12#123 #hi #£hihi #This is # #Hyvääpäivää #Dzieńdobry #जलवायुपरिवर्तन an #example of some text with #hash-tags - http://www.example.com/#anchor but dont want the link,#hashtag1,hi #123 hfg skjdf kjsdhf jsdhf kjhsdf kjhsdf khdsf kjhsdf kjhdsf hjjhjhf kjhsdjhd kjhsdfkjhsd #lasthashtag";

结果应该是:

["hello12", "123", "hi", "This", "", "Hyvääpäivää", "Dzieńdobry", "जलवायुपरिवर्तन", "example", "hash", "anchor", "hashtag1", "123", "lasthashtag"]

我现在拥有的:

["hello12", "123", "hi", "This", "Hyv", "Dzie", "example", "hash", "anchor", "hashtag1", "123", "lasthashtag"]

注意:我不想使用 JavaScript 库。

谢谢

标签: javascriptregexunicode

解决方案


假设主题标签中不允许的字符是!$%^&*+.(您提到的那些)和,(基于您的示例),您可以使用以下正则表达式模式:

/#[^\s!$%^&*+.,#]+/gm

这是一个演示

注意:要排除更多字符,您可以像我上面那样将它们添加到字符类中。显然,您不能仅仅因为您想支持其他 Unicode 符号和表情符号而依赖字母数字字符。

JavaScript 代码示例:

const regex = /#[^\s!$%^&*+.,#]+/gm;
const str = "#hello12#123 #hi #£hihi #This is # #Hyvääpäivää #Dzieńdobry #जलवायुपरिवर्तन an #example of some text with #hash-tags - http://www.example.com/#anchor but dont want the link,#hashtag1,hi #123 hfg skjdf kjsdhf jsdhf kjhsdf kjhsdf khdsf kjhsdf kjhdsf hjjhjhf kjhsdjhd kjhsdfkjhsd #lasthashtag";
let m;

while ((m = regex.exec(str)) !== null) {
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    m.forEach((match) => {
        console.log("Found match: " + match);
    });
}


推荐阅读