首页 > 解决方案 > 解析器没有使用所有标记,这是一个错误吗?

问题描述

环境:antlr 4.7.1

语法是:

grammar Whilelang;
program : seqStatement;
seqStatement: statement (';' statement)* ;
statement: ID ':=' expression                          # attrib
     | 'print' Text                                # print
     | '{' seqStatement '}'                        # block
     ;
expression: INT                                        # int
      | ID                                         # id
      | expression ('+'|'-') expression            # binOp
      | '(' expression ')'                         # expParen
      ;
bool: ('true'|'false')                                 # boolean
    | expression '=' expression                        # relOp
    | expression '<=' expression                       # relOp
    | 'not' bool                                       # not
    | bool 'and' bool                                  # and
    | '(' bool ')'                                     # boolParen
;
INT: ('0'..'9')+ ;
ID: ('a'..'z')+;
Text: '"' .*? '"';
Space: [ \t\n\r] -> skip;

输入语言代码为:

a := 1
b := 2

根据语法,Antlr4 应该会输出错误 --" expect ';' 在第 1 行“为上述输入语言代码。但实际上。没有错误输出,似乎语法只接受部分输入,并且没有消耗所有输入标记。是antlr4的bug吗?

$ grun Whilelang program -trace
a := 1
b := 2
^d
enter   program, LT(1)=a
enter   seqStatement, LT(1)=a
enter   statement, LT(1)=a
consume [@0,0:0='a',<17>,1:0] rule statement
consume [@1,2:3=':=',<2>,1:2] rule statement
enter   expression, LT(1)=1
consume [@2,5:5='1',<16>,1:5] rule expression
exit    expression, LT(1)=b
exit    statement, LT(1)=b
exit    seqStatement, LT(1)=b
exit    program, LT(1)=b

标签: antlrantlr4

解决方案


Not a bug. ANTLR is doing exactly what it was asked to do.

Given the rules

program : seqStatement;
seqStatement: statement (';' statement)* ;

the program rule is then entirely complete when at least one statement has been matched. Since the parser cannot validly match another statement -- optional per the grammar-- it stops.

Changing to

program : seqStatement EOF;

requires the program rule to match statements until it can also match an EOF token (the lexer automatically adds an EOF at the end of the source text). This likely the behavior you are looking for.


推荐阅读