首页 > 解决方案 > pyparsing 中的 Group() 是否需要后处理步骤来生成特定于被解析语言的结构

问题描述

这是这个问题的延伸。我将pyparsing代码编写为语法的一对一翻译。

我的 DSL :

response:success
response:success AND extension:php OR extension:css
response:sucess AND (extension:php OR extension:css)
time >= 2020-01-09
time >= 2020-01-09 AND response:success OR os:windows
NOT reponse:success
response:success AND NOT os:windows

高于 DSL 的 EBNF 语法:

<expr> ::= <or>
<or> ::= <and> (" OR " <and>)*
<and> ::= <unary> ((" AND ") <unary>)*
<unary> ::= " NOT " <unary> | <equality>
<equality> ::=  (<word> ":" <word>) | <comparison>
<comparison> ::= "(" <expr> ")" | (<word> (" > " | " >= " | " < " | " <= ") <word>)+
<word> ::= ("a" | "b" | "c" | "d" | "e" | "f" | "g"
                      | "h" | "i" | "j" | "k" | "l" | "m" | "n"
                      | "o" | "p" | "q" | "r" | "s" | "t" | "u"
                      | "v" | "w" | "x" | "y" | "z")+

现在我可以得到一个令牌列表。下一步是生成某种 ast/结构,以便我可以从每个节点类型生成代码?

在阅读了一些示例之后,pyparsing我想我对如何处理这个问题有了一个模糊的想法:

1)我可以使用Group()对代码生成重要的相关结构进行分组,每个分组可能代表 ast 中的一个节点。
2)与Group()i 一起,可以setParseAction()在解析阶段本身使用直接编码我的节点对象的 python 表示,而不是先生成结构。

My approach in code:
AND = Keyword('AND')
OR  = Keyword('OR')
NOT = Keyword('NOT')
word = Word(alphanums+'_')




expr = Forward()
Comparison = Literal('(') + expr + Literal(')')  + OneOrMore(word + ( Literal('>') | Literal('>=') | Literal('<') | Literal('<=')) + word)
Equality = Group((word('searchKey') + Literal(':') + word('searchValue')) | Comparison)
Unary = Forward()
unaryNot = NOT + Unary
Unary << (unaryNot | Equality)
And = Group(Unary + ZeroOrMore(AND + Unary))
Or = And + ZeroOrMore(OR + And)

expr << Or



class AndNode:
    def __init__(self, tokens):
        self.tokens = tokens.asList()

    def query(self):
        pass #generate the relevant elastic search query here?


class ExactMatchNode:
    def __init__(self, tokens):
        self.tokens = tokens

    def __repr__(self):
        return "<ExactMatchNode>"
    def query(self):
        pass #generate the relevant elasticsearch query here?


Equality.setParseAction(ExactMatchNode)




Q1 = '''response:200 AND time:22 AND rex:32 OR NOT demo:good'''
result = expr.parseString(Q1)

print(result.dump())

这是我的输出:

[[<ExactMatchNode>, 'AND', <ExactMatchNode>, 'AND', <ExactMatchNode>], 'OR', ['NOT', <ExactMatchNode>]]
[0]:
  [<ExactMatchNode>, 'AND', <ExactMatchNode>, 'AND', <ExactMatchNode>]
[1]:
  OR
[2]:
  ['NOT', <ExactMatchNode>]

我在这一点上迷失了,因为这如何代表树结构?前任。

[<ExactMatchNode>, 'AND', <ExactMatchNode>, 'AND', <ExactMatchNode>]

应该是这样吗?

[AND [<ExactMatchNode>, <ExactMatchNode>,  <ExactMatchNode>]]

我想这可以完成,setParseAction但我不确定这是正确的方向吗?或者我现在应该开始修改我的语法。这个 DSL 的最终目标是将给定的查询翻译成elasticsearchjson 查询语言。

编辑: 在尝试了一些事情之后,这就是我所拥有的:

class NotNode:
    def __init__(self, tokens):
        self.negatearg = tokens
        #print(f'**** \n {self.negatearg} \n +++')

    def __repr__(self):
        return f'( NOT-> {self.negatearg} )'

class AndNode:
    def __init__(self, tokens):
        self.conds = tokens[0][::2]
        #print(f'**** \n {tokens} \n +++')

    def __repr__(self):
        return f'( AND-> {self.conds} )'

    def generate_query(self):
        result = [cond.generate_query() for cond in self.conds]
        return result


class ExactMatchNode:
    def __init__(self, tokens):
        self.tokens = tokens[0]
        #print(f'**** \n {tokens} \n +++')

    def __repr__(self):
        return f"<ExactMatchNode {self.tokens.searchKey}={self.tokens.searchValue}>"

    def generate_query(self):
        return {
                'term' : { self.tokens[0]: self.tokens[2]}
        }


unaryNot.setParseAction(NotNode)
Equality.setParseAction(ExactMatchNode)
And.setParseAction(AndNode)

我现在可以<some node object>.generate_query()用来获取查询。

但我在下面的输出中注意到的一件奇怪的事情是:

[( AND-> [<ExactMatchNode response=200>, <ExactMatchNode time=22>, <ExactMatchNode rex=32>] ), 'OR', ( AND-> [( NOT-> ['NOT', <ExactMatchNode demo=good>] )] )] 

第二个AND->附加在NOT节点之前。

我的问题还是一样,这甚至是使用 pyparsing 的正确方法,还是我错过了一些明显的东西并朝着错误的方向前进?

标签: pythonpyparsing

解决方案


使用 setParseAction 附加节点类是我发现从分层语法构建 AST 的最佳方式。如果您使用此方法,您可能不需要 Group 构造。你得到第二个 And 的原因是你的解析器总是产生一个 AndNode,即使只有一个操作数没有额外AND operand的 .

您可以扩展您的 And 表达式以仅在存在operand AND operand(对于 NOT 和 OR 也是如此)时附加 AndNode 解析操作类,例如:

And = (Unary + OneOrMore(AND + Unary)).addParseAction(AndNode) | Unary
Or = (And + OneOrMore(OR + And)).addParseAction(OrNode) | And

这就是 pyparsing 的 infixNotation 处理这类运算符的方式。

我的解析器版本,使用 infixNotation (我认为类几乎都是一样的,也许我调整了 NotNode 定义):

"""
<expr> ::= <or>
<or> ::= <and> (" OR " <and>)*
<and> ::= <unary> ((" AND ") <unary>)*
<unary> ::= " NOT " <unary> | <equality>
<equality> ::=  (<word> ":" <word>) | <comparison>
<comparison> ::= "(" <expr> ")" | (<word> (" > " | " >= " | " < " | " <= ") <word>)+
<word> ::= ("a" | "b" | "c" | "d" | "e" | "f" | "g"
                      | "h" | "i" | "j" | "k" | "l" | "m" | "n"
                      | "o" | "p" | "q" | "r" | "s" | "t" | "u"
                      | "v" | "w" | "x" | "y" | "z")+
"""

import pyparsing as pp

NOT, AND, OR = map(pp.Keyword, "NOT AND OR".split())

word = ~(NOT | AND | OR) + pp.Word(pp.alphas.lower() + '-_')
date = pp.Regex(r"\d{4}-\d{2}-\d{2}")
operand = word | date

class ExactMatchNode:
    def __init__(self, tokens):
        self.tokens = tokens

    def __repr__(self):
        return "<ExactMatchNode>"
    def query(self):
        pass #generate the relevant elasticsearch query here?

class ComparisonNode:
    def __init__(self, tokens):
        self.tokens = tokens

    def __repr__(self):
        return "<ComparisonNode>"
    def query(self):
        pass #generate the relevant elasticsearch query here?

class NotNode:
    def __init__(self, tokens):
        self.negatearg = tokens[0][1]
        #print(f'**** \n {self.negatearg} \n +++')

    def __repr__(self):
        return f'( NOT-> {self.negatearg} )'

class AndNode:
    def __init__(self, tokens):
        self.conds = tokens[0][::2]
        #print(f'**** \n {tokens} \n +++')

    def __repr__(self):
        return f'( AND-> {self.conds} )'

    def generate_query(self):
        result = [cond.generate_query() for cond in self.conds]
        return result

class OrNode:
    def __init__(self, tokens):
        self.conds = tokens[0][::2]
        #print(f'**** \n {tokens} \n +++')

    def __repr__(self):
        return f'( OR-> {self.conds} )'

    def generate_query(self):
        result = [cond.generate_query() for cond in self.conds]
        return result

expr = pp.infixNotation(operand,
    [
    (':', 2, pp.opAssoc.LEFT, ExactMatchNode),
    (pp.oneOf("> >= < <="), 2, pp.opAssoc.LEFT, ComparisonNode),
    (NOT, 1, pp.opAssoc.RIGHT, NotNode),
    (AND, 2, pp.opAssoc.LEFT, AndNode),
    (OR, 2, pp.opAssoc.LEFT, OrNode),
    ])


expr.runTests("""\
    response:success
    response:success AND extension:php OR extension:css
    response:sucess AND (extension:php OR extension:css)
    time >= 2020-01-09
    time >= 2020-01-09 AND response:success OR os:windows
    NOT reponse:success
    response:success AND NOT os:windows
    """)

印刷

response:success
[<ExactMatchNode>]

response:success AND extension:php OR extension:css
[( OR-> [( AND-> [<ExactMatchNode>, <ExactMatchNode>] ), <ExactMatchNode>] )]

response:sucess AND (extension:php OR extension:css)
[( AND-> [<ExactMatchNode>, ( OR-> [<ExactMatchNode>, <ExactMatchNode>] )] )]

time >= 2020-01-09
[<ComparisonNode>]

time >= 2020-01-09 AND response:success OR os:windows
[( OR-> [( AND-> [<ComparisonNode>, <ExactMatchNode>] ), <ExactMatchNode>] )]

NOT reponse:success
[( NOT-> <ExactMatchNode> )]

response:success AND NOT os:windows
[( AND-> [<ExactMatchNode>, ( NOT-> <ExactMatchNode> )] )]

推荐阅读