parsing - A case where ANTLR4 terminates parsing successfully before the end of file is reached due to a parsing error
问题描述
I gave ANTLR4 the following parser and lexer grammar in separate files (referring to a simple grammar for BNF grammar )
parser grammar BNFParser;
options {tokenVocab = BNFLexer;}
compileUnit
: grammar_rule+
;
grammar_rule : NON_TERMINAL COLON (OR? grammar_rule_alternative)* SEMICOLON
;
grammar_rule_alternative : (NON_TERMINAL|TERMINAL)+
;
and
lexer grammar BNFLexer;
TERMINAL : [A-Z][A-Za-z0-9_]*;
NON_TERMINAL : [a-z][A-Za-z0-9_]*;
OR : '|';
COLON : ':';
SEMICOLON : ';';
WS
: [ \t\r\n]+ -> skip
;
The main program
private static void Main(string[] args) {
StreamReader reader = new StreamReader(args[0]);
AntlrInputStream stream = new AntlrInputStream(reader);
BNFLexer lexer = new BNFLexer(stream);
CommonTokenStream tokens = new CommonTokenStream(lexer);
BNFParser parser = new BNFParser(tokens);
IParseTree root = parser.compileUnit();
Console.WriteLine(root.ToStringTree());
}
Also supplied the following test file for testing the grammar
compileunit : x a
;
x : S b
;
S : compileunit f
;
Please notice from the lexer grammar that Non-Terminals begin with a lowercase letters while Terminals begin with an uppercase letter. This given grammar has an error. The third rule uses a capital letter ( S ) to define Non-Terminal S. The expected behaviour would be to report this as an error. In the contrary parsing succeeds by consuming the first 2 rules and ignoring the third for S without reporting any error. I have also seen the generated files and i noticed the following
try {
EnterOuterAlt(_localctx, 1);
{
State = 7;
_errHandler.Sync(this);
_la = _input.La(1);
do {
{
{
State = 6; grammar_rule();
}
}
State = 9;
_errHandler.Sync(this);
_la = _input.La(1);
} while ( _la==NON_TERMINAL );
}
}
catch (RecognitionException re) {
_localctx.exception = re;
_errHandler.ReportError(this, re);
_errHandler.Recover(this, re);
}
The above code shows that the parser expects a Non-Terminal symbol at the start of a grammar_rule which is what i expect. However what happens when this is not the case? Also another weird issue is that the CommonTokenStream object that contains the tokens recognized by the lexer contains only the tokens until the end of the second rule but non of the tokens of the third rule (S). Is this proper behaviour?
解决方案
Add an EOF token to your main rule (compileUnit
). That will force the parser to use all input until EOF and report an error if that didn't fully match.
推荐阅读
- python - psycopg2:如何将数据从 Python 应用程序添加到 Postgresql
- java - 为什么 saveAll() 总是插入数据而不是更新数据?
- arrays - 读取浮点数的二进制文件并将它们放入数组中
- amazon-web-services - 为什么我的 AWS S3 策略不会仅限制某些 IP 地址的访问?
- python - mplfinance 图未正确显示时间值
- reactjs - 在reactjs nextjs中将url重定向到www作为服务器端301
- python - 在 Python 中调用变量(本地与全局)
- c# - 如何在控制器中使用 IdentityServer4 获取 ASP.NET 核心中的当前用户?
- r - 在 R 中使用 plot_grid 和 get_legend 的困难
- terminal - 如何更改路径?