首页 > 解决方案 > How to detect C++ valid identifiers using regex?

问题描述

I am a beginner to Regular expressions although I know how to use them, searching, replacing...

I want to write a program that detects C++ valid identifiers. e.g:

_ _f _8 my_name age_ var55 x_ a

And so on...

So I've tried this:

std::string str = "9_var 57age my_marks cat33 fit*ell +bin set_";
std::string pat = "[_a-z]+[[:alnum:]]*";
std::regex reg(pat, std::regex::icase);
std::smatch sm;
if(std::regex_search(str, sm, reg))
    std::cout << sm.str() << '\n';
else
    std::cout << "no valid C++ identifier found!\n";

The output:

_var

But as we know a C++ identifier should not start with a digit so 9_var mustn't be a candidate for the matches. But what I see here is the compiler takes only the sub-string _var from 9_var and treated it as a much. I want to discard a whole word such "9_var". I need some way to get only matches those only start with an alphabetic character or an underscore.

So how can I write a program that detects valid identifiers? Thank you!

标签: c++regex

解决方案


您的模式不检查单词边界,因此它能够匹配字符串的一部分。更新的正则表达式如下所示:

std::string pat = "\\b[_a-z]+[[:alnum:]]*\\b";

仅更新后,匹配项是您的string.

$ ./a.out 
my_marks

如果要查找所有有效标识符,则需要循环。您还需要过滤掉保留字,但正则表达式不是一个好的解决方案。


推荐阅读