python - pyparsing 中的 Group() 是否需要后处理步骤来生成特定于被解析语言的结构
问题描述
这是这个问题的延伸。我将pyparsing
代码编写为语法的一对一翻译。
我的 DSL :
response:success
response:success AND extension:php OR extension:css
response:sucess AND (extension:php OR extension:css)
time >= 2020-01-09
time >= 2020-01-09 AND response:success OR os:windows
NOT reponse:success
response:success AND NOT os:windows
高于 DSL 的 EBNF 语法:
<expr> ::= <or>
<or> ::= <and> (" OR " <and>)*
<and> ::= <unary> ((" AND ") <unary>)*
<unary> ::= " NOT " <unary> | <equality>
<equality> ::= (<word> ":" <word>) | <comparison>
<comparison> ::= "(" <expr> ")" | (<word> (" > " | " >= " | " < " | " <= ") <word>)+
<word> ::= ("a" | "b" | "c" | "d" | "e" | "f" | "g"
| "h" | "i" | "j" | "k" | "l" | "m" | "n"
| "o" | "p" | "q" | "r" | "s" | "t" | "u"
| "v" | "w" | "x" | "y" | "z")+
现在我可以得到一个令牌列表。下一步是生成某种 ast/结构,以便我可以从每个节点类型生成代码?
在阅读了一些示例之后,pyparsing
我想我对如何处理这个问题有了一个模糊的想法:
1)我可以使用Group()
对代码生成重要的相关结构进行分组,每个分组可能代表 ast 中的一个节点。
2)与Group()
i 一起,可以setParseAction()
在解析阶段本身使用直接编码我的节点对象的 python 表示,而不是先生成结构。
My approach in code:
AND = Keyword('AND')
OR = Keyword('OR')
NOT = Keyword('NOT')
word = Word(alphanums+'_')
expr = Forward()
Comparison = Literal('(') + expr + Literal(')') + OneOrMore(word + ( Literal('>') | Literal('>=') | Literal('<') | Literal('<=')) + word)
Equality = Group((word('searchKey') + Literal(':') + word('searchValue')) | Comparison)
Unary = Forward()
unaryNot = NOT + Unary
Unary << (unaryNot | Equality)
And = Group(Unary + ZeroOrMore(AND + Unary))
Or = And + ZeroOrMore(OR + And)
expr << Or
class AndNode:
def __init__(self, tokens):
self.tokens = tokens.asList()
def query(self):
pass #generate the relevant elastic search query here?
class ExactMatchNode:
def __init__(self, tokens):
self.tokens = tokens
def __repr__(self):
return "<ExactMatchNode>"
def query(self):
pass #generate the relevant elasticsearch query here?
Equality.setParseAction(ExactMatchNode)
Q1 = '''response:200 AND time:22 AND rex:32 OR NOT demo:good'''
result = expr.parseString(Q1)
print(result.dump())
这是我的输出:
[[<ExactMatchNode>, 'AND', <ExactMatchNode>, 'AND', <ExactMatchNode>], 'OR', ['NOT', <ExactMatchNode>]]
[0]:
[<ExactMatchNode>, 'AND', <ExactMatchNode>, 'AND', <ExactMatchNode>]
[1]:
OR
[2]:
['NOT', <ExactMatchNode>]
我在这一点上迷失了,因为这如何代表树结构?前任。
[<ExactMatchNode>, 'AND', <ExactMatchNode>, 'AND', <ExactMatchNode>]
应该是这样吗?
[AND [<ExactMatchNode>, <ExactMatchNode>, <ExactMatchNode>]]
我想这可以完成,setParseAction
但我不确定这是正确的方向吗?或者我现在应该开始修改我的语法。这个 DSL 的最终目标是将给定的查询翻译成elasticsearch
json 查询语言。
编辑: 在尝试了一些事情之后,这就是我所拥有的:
class NotNode:
def __init__(self, tokens):
self.negatearg = tokens
#print(f'**** \n {self.negatearg} \n +++')
def __repr__(self):
return f'( NOT-> {self.negatearg} )'
class AndNode:
def __init__(self, tokens):
self.conds = tokens[0][::2]
#print(f'**** \n {tokens} \n +++')
def __repr__(self):
return f'( AND-> {self.conds} )'
def generate_query(self):
result = [cond.generate_query() for cond in self.conds]
return result
class ExactMatchNode:
def __init__(self, tokens):
self.tokens = tokens[0]
#print(f'**** \n {tokens} \n +++')
def __repr__(self):
return f"<ExactMatchNode {self.tokens.searchKey}={self.tokens.searchValue}>"
def generate_query(self):
return {
'term' : { self.tokens[0]: self.tokens[2]}
}
unaryNot.setParseAction(NotNode)
Equality.setParseAction(ExactMatchNode)
And.setParseAction(AndNode)
我现在可以<some node object>.generate_query()
用来获取查询。
但我在下面的输出中注意到的一件奇怪的事情是:
[( AND-> [<ExactMatchNode response=200>, <ExactMatchNode time=22>, <ExactMatchNode rex=32>] ), 'OR', ( AND-> [( NOT-> ['NOT', <ExactMatchNode demo=good>] )] )]
第二个AND->
附加在NOT
节点之前。
我的问题还是一样,这甚至是使用 pyparsing 的正确方法,还是我错过了一些明显的东西并朝着错误的方向前进?
解决方案
使用 setParseAction 附加节点类是我发现从分层语法构建 AST 的最佳方式。如果您使用此方法,您可能不需要 Group 构造。你得到第二个 And 的原因是你的解析器总是产生一个 AndNode,即使只有一个操作数没有额外AND operand
的 .
您可以扩展您的 And 表达式以仅在存在operand AND operand
(对于 NOT 和 OR 也是如此)时附加 AndNode 解析操作类,例如:
And = (Unary + OneOrMore(AND + Unary)).addParseAction(AndNode) | Unary
Or = (And + OneOrMore(OR + And)).addParseAction(OrNode) | And
这就是 pyparsing 的 infixNotation 处理这类运算符的方式。
我的解析器版本,使用 infixNotation (我认为类几乎都是一样的,也许我调整了 NotNode 定义):
"""
<expr> ::= <or>
<or> ::= <and> (" OR " <and>)*
<and> ::= <unary> ((" AND ") <unary>)*
<unary> ::= " NOT " <unary> | <equality>
<equality> ::= (<word> ":" <word>) | <comparison>
<comparison> ::= "(" <expr> ")" | (<word> (" > " | " >= " | " < " | " <= ") <word>)+
<word> ::= ("a" | "b" | "c" | "d" | "e" | "f" | "g"
| "h" | "i" | "j" | "k" | "l" | "m" | "n"
| "o" | "p" | "q" | "r" | "s" | "t" | "u"
| "v" | "w" | "x" | "y" | "z")+
"""
import pyparsing as pp
NOT, AND, OR = map(pp.Keyword, "NOT AND OR".split())
word = ~(NOT | AND | OR) + pp.Word(pp.alphas.lower() + '-_')
date = pp.Regex(r"\d{4}-\d{2}-\d{2}")
operand = word | date
class ExactMatchNode:
def __init__(self, tokens):
self.tokens = tokens
def __repr__(self):
return "<ExactMatchNode>"
def query(self):
pass #generate the relevant elasticsearch query here?
class ComparisonNode:
def __init__(self, tokens):
self.tokens = tokens
def __repr__(self):
return "<ComparisonNode>"
def query(self):
pass #generate the relevant elasticsearch query here?
class NotNode:
def __init__(self, tokens):
self.negatearg = tokens[0][1]
#print(f'**** \n {self.negatearg} \n +++')
def __repr__(self):
return f'( NOT-> {self.negatearg} )'
class AndNode:
def __init__(self, tokens):
self.conds = tokens[0][::2]
#print(f'**** \n {tokens} \n +++')
def __repr__(self):
return f'( AND-> {self.conds} )'
def generate_query(self):
result = [cond.generate_query() for cond in self.conds]
return result
class OrNode:
def __init__(self, tokens):
self.conds = tokens[0][::2]
#print(f'**** \n {tokens} \n +++')
def __repr__(self):
return f'( OR-> {self.conds} )'
def generate_query(self):
result = [cond.generate_query() for cond in self.conds]
return result
expr = pp.infixNotation(operand,
[
(':', 2, pp.opAssoc.LEFT, ExactMatchNode),
(pp.oneOf("> >= < <="), 2, pp.opAssoc.LEFT, ComparisonNode),
(NOT, 1, pp.opAssoc.RIGHT, NotNode),
(AND, 2, pp.opAssoc.LEFT, AndNode),
(OR, 2, pp.opAssoc.LEFT, OrNode),
])
expr.runTests("""\
response:success
response:success AND extension:php OR extension:css
response:sucess AND (extension:php OR extension:css)
time >= 2020-01-09
time >= 2020-01-09 AND response:success OR os:windows
NOT reponse:success
response:success AND NOT os:windows
""")
印刷
response:success
[<ExactMatchNode>]
response:success AND extension:php OR extension:css
[( OR-> [( AND-> [<ExactMatchNode>, <ExactMatchNode>] ), <ExactMatchNode>] )]
response:sucess AND (extension:php OR extension:css)
[( AND-> [<ExactMatchNode>, ( OR-> [<ExactMatchNode>, <ExactMatchNode>] )] )]
time >= 2020-01-09
[<ComparisonNode>]
time >= 2020-01-09 AND response:success OR os:windows
[( OR-> [( AND-> [<ComparisonNode>, <ExactMatchNode>] ), <ExactMatchNode>] )]
NOT reponse:success
[( NOT-> <ExactMatchNode> )]
response:success AND NOT os:windows
[( AND-> [<ExactMatchNode>, ( NOT-> <ExactMatchNode> )] )]
推荐阅读
- python - 在文本框 Tkinter 中打印 numpy 输出
- c# - 根 Pickles_FeatureDirectory 上的 Pickles 解析错误
- excel - 使用 vba 从共享点下载具有动态文件名的 Zip 文件
- include - 如何判断我是否在 POSIX shell 中作为点脚本运行?
- android - 如果某些条件为真,则使用 RXjava 重复网络 API 调用
- global-payments-api - 交易未重定向回网站
- python - 如何按类别绘制数据框列
- angular - 来自html模板中反应形式的角度显示错误消息
- python - 使用 Rasterio 的 sum 方法重新采样
- java - 如何将带有“Id”的“标签”设置为 xml 文件中的 TextView?