首页 > 解决方案 > A case where ANTLR4 terminates parsing successfully before the end of file is reached due to a parsing error

问题描述

I gave ANTLR4 the following parser and lexer grammar in separate files (referring to a simple grammar for BNF grammar )

parser grammar BNFParser;

options {tokenVocab = BNFLexer;}

compileUnit
    :   grammar_rule+
    ;

grammar_rule : NON_TERMINAL COLON (OR? grammar_rule_alternative)* SEMICOLON
           ;

grammar_rule_alternative : (NON_TERMINAL|TERMINAL)+ 
                    ;

and

lexer grammar BNFLexer;

TERMINAL : [A-Z][A-Za-z0-9_]*;
NON_TERMINAL : [a-z][A-Za-z0-9_]*;
OR : '|';
COLON : ':';
SEMICOLON : ';';

WS
  : [ \t\r\n]+ -> skip
 ;

The main program

private static void Main(string[] args) {
            StreamReader reader = new StreamReader(args[0]);
            AntlrInputStream stream = new AntlrInputStream(reader);
            BNFLexer lexer = new BNFLexer(stream);
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            BNFParser parser = new BNFParser(tokens);
            IParseTree root = parser.compileUnit();
            Console.WriteLine(root.ToStringTree());
}

Also supplied the following test file for testing the grammar

 compileunit : x a
        ;

 x : S b
   ;

 S : compileunit f
  ;

Please notice from the lexer grammar that Non-Terminals begin with a lowercase letters while Terminals begin with an uppercase letter. This given grammar has an error. The third rule uses a capital letter ( S ) to define Non-Terminal S. The expected behaviour would be to report this as an error. In the contrary parsing succeeds by consuming the first 2 rules and ignoring the third for S without reporting any error. I have also seen the generated files and i noticed the following

try {
    EnterOuterAlt(_localctx, 1);
    {
    State = 7;
    _errHandler.Sync(this);
    _la = _input.La(1);
    do {
        {
        {
        State = 6; grammar_rule();
        }
        }
        State = 9;
        _errHandler.Sync(this);
        _la = _input.La(1);
    } while ( _la==NON_TERMINAL );
    }
 }
 catch (RecognitionException re) {
     _localctx.exception = re;
     _errHandler.ReportError(this, re);
     _errHandler.Recover(this, re);
 }

The above code shows that the parser expects a Non-Terminal symbol at the start of a grammar_rule which is what i expect. However what happens when this is not the case? Also another weird issue is that the CommonTokenStream object that contains the tokens recognized by the lexer contains only the tokens until the end of the second rule but non of the tokens of the third rule (S). Is this proper behaviour?

标签: parsingantlr4

解决方案


Add an EOF token to your main rule (compileUnit). That will force the parser to use all input until EOF and report an error if that didn't fully match.


推荐阅读