python - PEG 语法有序选择失败
问题描述
我有一个使用 Python Arpeggio包的玩具 DSL 的 PEG 语法:
from arpeggio.cleanpeg import ParserPEG
grammar = """
root = block* EOF
block = header (item1+ / item2+)
header = "block"
item1 = number name comment?
item2 = number name list comment?
number = r"\d+"
name = r"\w+"
list = r"\[.*\]"
comment = r"\/\/.*"
"""
doc = """
block
5 alpha [] //
3 beta [a, b, c] // this is an item2
block
6 foo
1 bar // This is an item1
4 baz // more stuff
"""
parser = ParserPEG(grammar, 'root', debug=True)
parse_tree = parser.parse(doc)
print ('Tree:', parse_tree)
这在解析测试文档时给出了奇怪的结果:它正确地未能item1
在有序选择中匹配,但随后错误地声称(标有 的行xxxx
)它确实匹配了选择而没有进行测试item2
,这本来是匹配的。
>> Matching rule root=Sequence at position 0 => * block 5
>> Matching rule ZeroOrMore in root at position 0 => * block 5
>> Matching rule block=Sequence in root at position 0 => * block 5
?? Try match rule header=StrMatch(block) in block at position 1 => *block 5
++ Match 'block' at 1 => ' *block* 5 '
>> Matching rule OrderedChoice in block at position 6 => block* 5 alpha
>> Matching rule OneOrMore in block at position 6 => block* 5 alpha
>> Matching rule item1=Sequence in block at position 6 => block* 5 alpha
?? Try match rule number=RegExMatch(\d+) in item1 at position 9 => block *5 alpha []
++ Match '5' at 9 => ' block *5* alpha []'
?? Try match rule name=RegExMatch(\w+) in item1 at position 11 => block 5 *alpha []
++ Match 'alpha' at 11 => 'block 5 *alpha* [] '
>> Matching rule Optional in item1 at position 16 => 5 alpha* []
?? Try match rule comment=RegExMatch(\/\/.*) in item1 at position 17 => 5 alpha *[]
-- NoMatch at 17
<<- Not matched rule Optional in item1 at position 16 => 5 alpha* []
<<+ Matched rule item1=Sequence in item1 at position 16 => 5 alpha* []
>> Matching rule item1=Sequence in block at position 16 => 5 alpha* []
?? Try match rule number=RegExMatch(\d+) in item1 at position 17 => 5 alpha *[]
-- NoMatch at 17
<<- Not matched rule item1=Sequence in item1 at position 16 => 5 alpha* []
xxxx--> <<+ Matched rule OneOrMore in block at position 16 => 5 alpha* []
<<+ Matched rule OrderedChoice in block at position 16 => 5 alpha* []
<<+ Matched rule block=Sequence in block at position 16 => 5 alpha* []
>> Matching rule block=Sequence in root at position 16 => 5 alpha* []
?? Try match rule header=StrMatch(block) in block at position 17 => 5 alpha *[]
-- No match 'block' at 17 => ' 5 alpha *[] * '
<<- Not matched rule block=Sequence in block at position 16 => 5 alpha* []
<<+ Matched rule ZeroOrMore in root at position 16 => 5 alpha* []
?? Try match rule EOF in root at position 17 => 5 alpha *[]
!! EOF not matched.
<<- Not matched rule root=Sequence in root at position 0 => * block 5
结果,解析器无法使用如果它在失败后item2
实际匹配,它将使用的 s 。item2
item1
这是解析器包中的错误,还是我的语法中的错误?
请注意,反转有序选择:
block = header (item2+ / item1+)
正确解析示例文档。但是在玩具问题中很容易解决的异常结果在实际语法中可能更难找到。项目明确地是项目 1 或项目 2,因此检查它们的顺序应该是无关紧要的,并且解析它们的代码应该一致地工作。
解决方案
在PEG中,表达式的顺序OrderedChoice
很重要。当解析器尝试item1+
匹配至少一个item1
成功时,整个有序选择就被认为是成功的。
一般来说,总是将更具体的匹配放在开头,而将更一般的匹配放在有序选择的结尾。
更新:在歧义检测和规则顺序对维基百科匹配部分的语言的影响中有一个很好的解释。
推荐阅读
- javascript - 如何仅向移动访问者显示列表的第一项
- java - 查找从 min 到 max 范围内的素数
- webrtc - WebRTC 的延迟是多少?
- c# - srings 操作:正则表达式替换为 \T
- google-bigquery - 数据流 - 将 avro 对象存储到未知的 GCS 文件夹
- laravel - laravel:显示下拉列表中的数据
- java - 如何在 Eclipse 插件上动态或更新名称?
- javascript - Chrome 自动下载 PDF
- sql - 如何在简单的三向关系上实现基于约束的参照完整性
- acumatica - 如何自定义 InventoryID 选择器?