首页 > 解决方案 > PEG 语法有序选择失败

问题描述

我有一个使用 Python Arpeggio包的玩具 DSL 的 PEG 语法:

from arpeggio.cleanpeg import ParserPEG

grammar = """
    root    = block* EOF
    block   = header (item1+ / item2+)
    header  = "block"
    item1   = number name comment?
    item2   = number name list comment?
    number  = r"\d+"
    name    = r"\w+"
    list    = r"\[.*\]"
    comment = r"\/\/.*"
"""

doc = """
block
  5 alpha []        //
  3 beta [a, b, c]  // this is an item2

block
  6 foo
  1 bar  // This is an item1
  4 baz  // more stuff
"""

parser = ParserPEG(grammar, 'root', debug=True)
parse_tree = parser.parse(doc)
print ('Tree:', parse_tree)

这在解析测试文档时给出了奇怪的结果:它正确地未能item1在有序选择中匹配,但随后错误地声称(标有 的行xxxx)它确实匹配了选择而没有进行测试item2,这本来是匹配的。

>> Matching rule root=Sequence at position 0 => * block   5
   >> Matching rule ZeroOrMore in root at position 0 => * block   5
      >> Matching rule block=Sequence in root at position 0 => * block   5
         ?? Try match rule header=StrMatch(block) in block at position 1 =>  *block   5 
         ++ Match 'block' at 1 => ' *block*   5 '
         >> Matching rule OrderedChoice in block at position 6 =>  block*   5 alpha
            >> Matching rule OneOrMore in block at position 6 =>  block*   5 alpha
               >> Matching rule item1=Sequence in block at position 6 =>  block*   5 alpha
                  ?? Try match rule number=RegExMatch(\d+) in item1 at position 9 =>  block   *5 alpha []
                  ++ Match '5' at 9 => ' block   *5* alpha []'
                  ?? Try match rule name=RegExMatch(\w+) in item1 at position 11 => block   5 *alpha []  
                  ++ Match 'alpha' at 11 => 'block   5 *alpha* []  '
                  >> Matching rule Optional in item1 at position 16 =>    5 alpha* []       
                     ?? Try match rule comment=RegExMatch(\/\/.*) in item1 at position 17 =>   5 alpha *[]        
                     -- NoMatch at 17
                  <<- Not matched rule Optional in item1 at position 16 =>    5 alpha* []       
               <<+ Matched rule item1=Sequence in item1 at position 16 =>    5 alpha* []       
               >> Matching rule item1=Sequence in block at position 16 =>    5 alpha* []       
                  ?? Try match rule number=RegExMatch(\d+) in item1 at position 17 =>   5 alpha *[]        
                  -- NoMatch at 17
               <<- Not matched rule item1=Sequence in item1 at position 16 =>    5 alpha* []
xxxx-->     <<+ Matched rule OneOrMore in block at position 16 =>    5 alpha* []       
         <<+ Matched rule OrderedChoice in block at position 16 =>    5 alpha* []       
      <<+ Matched rule block=Sequence in block at position 16 =>    5 alpha* []       
      >> Matching rule block=Sequence in root at position 16 =>    5 alpha* []       
         ?? Try match rule header=StrMatch(block) in block at position 17 =>   5 alpha *[]        
         -- No match 'block' at 17 => '  5 alpha *[]   *     '
      <<- Not matched rule block=Sequence in block at position 16 =>    5 alpha* []       
   <<+ Matched rule ZeroOrMore in root at position 16 =>    5 alpha* []       
   ?? Try match rule EOF in root at position 17 =>   5 alpha *[]        
   !! EOF not matched.
<<- Not matched rule root=Sequence in root at position 0 => * block   5

结果,解析器无法使用如果它在失败后item2实际匹配,它将使用的 s 。item2item1

这是解析器包中的错误,还是我的语法中的错误?

请注意,反转有序选择:

    block   = header (item2+ / item1+)

正确解析示例文档。但是在玩具问题中很容易解决的异常结果在实际语法中可能更难找到。项目明确地是项目 1 或项目 2,因此检查它们的顺序应该是无关紧要的,并且解析它们的代码应该一致地工作。

标签: pythonparsingpegarpeggio

解决方案


PEG中,表达式的顺序OrderedChoice很重要。当解析器尝试item1+匹配至少一个item1成功时,整个有序选择就被认为是成功的。

一般来说,总是将更具体的匹配放在开头,而将更一般的匹配放在有序选择的结尾。

更新:在歧义检测和规则顺序对维基百科匹配部分的语言的影响中有一个很好的解释。


推荐阅读