首页 > 解决方案 > Boost Spirit: parse boolean expression and reduce to canonical normal form

问题描述

I want to parse a common Boolean with just or, and and not operators, which I think I have done using Boost Spirit below. In phase 2 (or perhaps part of the parsing itself), I wish to transform the AST of the Boolean to disjunctive canonical normal form, which essentially "flattens" the expression and removes all grouping operators.

In one of my attempts, I created the Boost static_visitor below, named Transformer. I started by trying to eliminate double not operators by just assigning a child node to it's grandparent if the child and the parent are both not operators. My problem is referring to the parent of the current node. It seems like there is no way to refer to the current node's parent, because once a node is visited, the visit function overloads on the inner type of the 'variant' thus discarding the variant nature of the object. Any help appreciated.

struct op_or  {};
struct op_and {};
struct op_not {};

typedef std::string var;
template <typename tag> struct binop;
template <typename tag> struct uniop;

typedef boost::variant
    <
        var,
        boost::recursive_wrapper<uniop<op_not>>,
        boost::recursive_wrapper<binop<op_and>>,
        boost::recursive_wrapper<binop<op_or>>
    >
    expr;

template <typename tag> struct uniop
{
    explicit uniop(expr const& o) : exp_u(o) { }
    expr exp_u;
};

template <typename tag> struct binop
{
    explicit binop(expr const& l, expr const& r) : exp_l(l), exp_r(r) { }
    expr exp_l, exp_r;
};

struct transformer : boost::static_visitor<void>
{
    std::deque<std::reference_wrapper<expr>> stk;

    transformer(expr & e)
    {
        stk.push_back(e);
    }

    void operator()(var const& v) const { }

    void operator()(uniop<op_not> & u)
    {
        if (boost::get<uniop<op_not>>(&stk.back().get()) != nullptr)
        {
            stk.back() = u.exp_u;
        }
        else
        {
            stk.push_back(std::ref(u));  // <<=== Fails with "no matching function for call"
            boost::apply_visitor(*this, u.exp_u);
            stk.pop_back();
        }
    }
    void operator()(binop<op_and> & b)
    {
        stk.push_back(std::ref(u));
        boost::apply_visitor(*this, b.exp_l);
        boost::apply_visitor(*this, b.exp_r);
        stk.pop_back();
    }
    void operator()(binop<op_or> & b)
    {
        stk.push_back(std::ref(u));
        boost::apply_visitor(*this, b.exp_l);
        boost::apply_visitor(*this, b.exp_r);
        stk.pop_back();
    }
};

template <typename It, typename Skipper = boost::spirit::qi::space_type>
struct parser : boost::spirit::qi::grammar<It, expr(), Skipper>
{
    parser() : parser::base_type(expr_)
    {
        using namespace boost::phoenix;
        using namespace boost::spirit::qi;

        using boost::spirit::qi::_1;

        expr_  = or_.alias();

        or_  = and_ [ _val = _1 ] >> *("or" >> and_ [ _val = construct<binop<op_or>>(_val, _1) ]);
        and_ = not_ [ _val = _1 ] >> *("and" >> not_ [ _val = construct<binop<op_and>>(_val, _1) ]);
        not_ = "not" > simple [ _val = construct<uniop<op_not>>(_1) ] | simple [ _val = _1 ];

        simple =  '(' > expr_ > ')' | var_;
        var_ = lexeme[ +alpha ];
    }

private:
    boost::spirit::qi::rule<It, var() , Skipper> var_;
    boost::spirit::qi::rule<It, expr(), Skipper> not_, and_, or_, simple, expr_;
};

标签: c++grammarboost-spirit

解决方案


看来到 DCNF 的转换是 NP 完全的。因此,您可以期望做出让步。

高度简化的子任务只是消除了双重否定。看起来您试图保留一堆父表达式引用 ( stk) 但是:

  1. 您没有明确展示提取或返回简化表达式的方法(原始表达式将保持不变)
  2. 您尝试推送一个uniop<>节点作为对expr类型不匹配的节点的引用:

    stk.push_back(std::ref(u));  // <<=== Fails with "no matching function for call"
    

    对我来说,这只是另一个症状,即

    transformer(expr & e)        {
        stk.push_back(e);
    }
    

    无法递归到子表达式。如果是这样,您可以相信周围环境expr&已经在堆栈上。binop/unop 处理程序也是如此,它们都试图推送u在当时范围内甚至不存在的引用,并且可能是为了推送当前节点,该节点遇到相同类型的不匹配。

第一的:simplify

我认为以函数式风格编写这些内容要容易得多:与其“操作”对象图,不如让转换返回转换后的结果

这一次意味着您可以保持所有节点类型不变,除非您的节点类型是嵌套否定。这是它的外观:

struct simplify {
    typedef expr result_type;

    // in general, just identity transform
    template <typename E> auto operator()(E const& e) const { return e; }

    // only handle these:
    auto operator()(expr const& e) const { return apply_visitor(*this, e); }
    expr operator()(unop<op_not> const& e) const {
        if (auto nested_negation = boost::strict_get<unop<op_not>>(&e.exp_u)) {
            return nested_negation->exp_u;
        }
        return e;
    }
};

一个简单的测试程序将是:

Live On Coliru

std::vector<expr> tests {
    "a",
    NOT{"a"},
    AND{"a", "b"},
    OR{"a","b"},
    AND{NOT{"a"},NOT{"b"}},
    NOT{{NOT{"a"}}},
};

const simplifier simplify{};

for (expr const& expr : tests) {
    std::cout << std::setw(30) << str(expr) << " -> " << simplify(expr) << "\n";
}

印刷:

                       "a" -> "a"
                  NOT{"a"} -> NOT{"a"}
              AND{"a","b"} -> AND{"a","b"}
               OR{"a","b"} -> OR{"a","b"}
    AND{NOT{"a"},NOT{"b"}} -> AND{NOT{"a"},NOT{"b"}}
             NOT{NOT{"a"}} -> "a"

使用堆栈/变异

使用堆栈的类似方法**似乎*同样容易:

这里是龙

struct stack_simplifier {
    typedef void result_type;
    std::deque<std::reference_wrapper<expr>> stk;

    void operator()(expr& e) {
        stk.push_back(e);
        apply_visitor(*this, e);
        stk.pop_back();
    }

    template <typename Other>
    void operator()(Other&) {}

    void operator()(unop<op_not>& e) {
        if (auto nested_negation = boost::strict_get<unop<op_not>>(&e.exp_u)) {
            stk.back().get() = nested_negation->exp_u;
        }
    }
};

用法将不再是const(因为函数不纯),expr参数也是如此(将被变异):

for (expr expr : tests) {
    std::cout << std::setw(30) << str(expr);

    stack_simplifier{}(expr);
    std::cout << " -> " << expr << "\n";
}

它 /does/ 似乎有效(Live On Coliru),但也有明显的缺点:

  • 堆栈没有实际用途,只检查顶部元素(您可以将其替换为指向当前表达式节点的指针
  • 函子对象是非纯的/非常量的
  • 遍历时表达式树正在发生突变。这只是你调用未定义行为的定时炸弹:在

    void operator()(unop<op_not>& e) {
        if (auto nested_negation = boost::strict_get<unop<op_not>>(&e.exp_u)) {
            stk.back().get() = nested_negation->exp_u;
        }
    }
    

    在分配给堆栈顶部的表达式之后,对的引用e是悬空的。也是如此nested_negation。超出该点的取消引用是UB

  • 现在在这个简单的场景中(折叠双重否定),在精神上检查这是否真的没问题似乎并不难。错误的

    事实证明,operator=在变体调用上variant_assign,它看起来像这样:

    void variant_assign(const variant& rhs)
    {
        // If the contained types are EXACTLY the same...
        if (which_ == rhs.which_)
        {
            // ...then assign rhs's storage to lhs's content:
            detail::variant::assign_storage visitor(rhs.storage_.address());
            this->internal_apply_visitor(visitor);
        }
        else
        {
            // Otherwise, perform general (copy-based) variant assignment:
            assigner visitor(*this, rhs.which());
            rhs.internal_apply_visitor(visitor); 
        }
    }
    

    访问者有一个致命的assigner细节(选择了一个 nothrow-aware 重载):

    template <typename RhsT, typename B1, typename B2>
    void assign_impl(
          const RhsT& rhs_content
        , mpl::true_ // has_nothrow_copy
        , B1 // is_nothrow_move_constructible
        , B2 // has_fallback_type
        ) const BOOST_NOEXCEPT
    {
        // Destroy lhs's content...
        lhs_.destroy_content(); // nothrow
    
        // ...copy rhs content into lhs's storage...
        new(lhs_.storage_.address())
            RhsT( rhs_content ); // nothrow
    
        // ...and indicate new content type:
        lhs_.indicate_which(rhs_which_); // nothrow
    }
    

    OOPS原来左手边先被破坏了。然而在

        stk.back().get() = nested_negation->exp_u;
    

    右侧是左侧的子对象 (!!!)。在这里避免UB的不直观方法是临时复制¹:

        expr tmp = nested_negation->exp_u;
        stk.back().get() = tmp;
    
  • 想象一下,您正在应用像德摩根定律这样的转换。如果子表达式中包含(也)嵌套否定怎么办?

在我看来,变异方法只是不必要地容易出错。

递归的、不可变的转换又名 Joy

到目前为止,这些方法还有另一个问题。嵌套的子表达式在此处不进行转换。例如

  NOT{NOT{AND{"a",NOT{NOT{"b"}}}}} -> AND{"a",NOT{NOT{"b"}}}

而不是想要的AND{"a","b"}. 这在纯功能方法中很容易解决:

struct simplifier {
    typedef expr result_type;

    template <typename T> auto operator()(T const& v) const { return call(v); }

  private:
    auto call(var const& e) const { return e; }
    auto call(expr const& e) const {
        auto s = apply_visitor(*this, e);
        return s;
    }
    expr call(unop<op_not> const& e) const {
        if (auto nested_negation = boost::strict_get<unop<op_not>>(&e.exp_u)) {
            return call(nested_negation->exp_u);
        }

        return unop<op_not> {call(e.exp_u)};
    }
    template <typename Op> auto call(binop<Op> const& e) const {
        return binop<Op> {call(e.exp_l), call(e.exp_r)};
    }
};

一切仍然是不可变的,但我们处理所有类型的表达式以递归它们的子表达式。现在它打印:

Live On Coliru

                               "a" -> "a"
                          NOT{"a"} -> NOT{"a"}
                      AND{"a","b"} -> AND{"a","b"}
                       OR{"a","b"} -> OR{"a","b"}
            AND{NOT{"a"},NOT{"b"}} -> AND{NOT{"a"},NOT{"b"}}
                     NOT{NOT{"a"}} -> "a"
  NOT{NOT{AND{"a",NOT{NOT{"b"}}}}} -> AND{"a","b"}

为了完整起见,对“stack_simplifier”进行了类似的转换:http: //coliru.stacked-crooked.com/a/cc5627aa37f0c969


¹实际上可能会使用移动语义,但为了清楚起见,我忽略了


推荐阅读