regex - Convert regex pattern to LL1 parser
问题描述
Background: I'm trying to solve this leetcode problem: regular-expression-matching. My approach is to implement a LL(1) parser generator, might be overkilling for this problem but it's just a brain exercise.
I only have entry level knowledge of parser theory, so excuse me if this is a silly question that I ask.
So, there's this testcase that I fail. Requirement is regex pattern should match the whole string
Regex: a*a
Input: aaa
I cannot wrap my mind about how to convert this pattern into a LL(1) parser.
The way I see it, a*a
pattern can be converted into production rules:
S -> Aa # a*a
A -> aA | ε # a*
Parse table:
a | $ | |
---|---|---|
S | S -> Aa | |
A | A -> aA | A -> ε |
Here's the steps to parse:
0: S$ aaa$ # use [S,a]
1: Aa$ aaa$ # use [A,a]
2: aAa$ aaa$ # eat 'a'
3: Aa$ aa$ # use [A,a]
4: aAa$ aa$ # eat 'a'
5: Aa$ a$ # use [A,a]
6: aAa$ a$ # eat 'a'
7: Aa$ $ # use [A,$]
8: a$ $ # Error!
The correct matching should be:
a* -> aa
a -> a
But what I get instead is:
a* -> aaa
a -> Error!
I don't know which part I'm missing .
解决方案
At parsing step 5, the parser should have used [A,$]
instead of [A,a]
. But it could only make this decision by looking ahead, or with backtracking. Both of which are beyond LL(1).
The cause of the problem is in your production rules. There is a FIRST/FOLLOW conflict: FIRST(A) contains ε, and FIRST(A) and FOLLOW(A) both contain a. This is explained with an example here:
https://inst.eecs.berkeley.edu/~cs164/sp18/discussion/04/04-solutions.pdf
In more intuitive terms, you are demanding the parser to look ahead, beyond the a
in S -> Aa
, to be able to decide how many terminals a
should be consumed by A -> aA | ε
.
Basically, an LL(1) parser cannot parse a*a
. It would have to be rewritten to aa*
first. There may be algorithms that are successful at rewriting something as simple as a*a
to aa*
, but it is impossible to rewrite every possible regex to an LL(1) grammar.
推荐阅读
- javascript - 如何使本地 API 连接到 ajax?
- xml - matlab 中的 Cplex API 无法在 .sol 文件中编写解决方案
- asp.net-core - ASP.NET CORE Razor Pages:如何避免在页面刷新时第二次执行页面处理程序?
- c++ - OpenGL:如何将多个纹理传递给具有一个变量的着色器?
- c# - 我想根据相同的值按 Solo 列分组,并通过 linq 计算其总数
- javascript - 访问 req.body 时发布请求返回 {}
- jasper-reports - jasper report studio - 如何在细节带中添加线条。以下是具有空白详细信息带的页面
- java - Spring Batch 中的每个作业调用是否都会打开一个新的数据库连接池?
- mockito - 当 andReturn 无法返回预期结果时模拟
- qt - 什么是 WaylandView?