首页 > 解决方案 > Boost Spirit Qi语法添加到船长内的列表

问题描述

解析这些字符串:

int main(){
    for (const std::string input: std::vector<std::string> { 
            "module simple_in_n_out();endmodule;",
            "module simple_in_n_out(in_1);endmodule;",
            "module simple_in_n_out(in_1,in_2,in_3);endmodule;",
            })
    {
        parse_verilog_file(input);
    }
    return 0;
}

在前两个输入和第一个字符串的 push_back 上成功,但在向向量添加更多字符串时失败:

            std::string module_name;
            stringvec module_inputs;

            module_input_list %= tok.identifier[push_back(phoenix::ref(module_inputs), _1)] % qi::lit(',');
            module_input_list.name("module_input_list");
            BOOST_SPIRIT_DEBUG_NODE(module_input_list);
            module_stmt
                %=   tok.module_ >> tok.identifier[phoenix::ref(module_name) = _1] 
                >> '(' >> -(module_input_list) >> ')'
                >> ';';
            module_stmt.name("module");
            BOOST_SPIRIT_DEBUG_NODE(module_stmt);

输出如下所示:

<module_stmt>
  <try>[module]</try>
  <module_input_list>
    <try>[)][;][endmodule][;]</try>
    <fail/>
  </module_input_list>
  <success>[endmodule][;]</success>
  <attributes>[]</attributes>
</module_stmt>
<module_stmt>
  <try>[endmodule][;]</try>
  <fail/>
</module_stmt>
TODO: put the module together now
<module_stmt>
  <try></try>
  <fail/>
</module_stmt>
-------------------------
Parsing succeeded
-------------------------
module name: simple_in_n_out
<module_stmt>
  <try>[module]</try>
  <module_input_list>
    <try>[in_1][)][;][endmodule][;]</try>
    <success>[)][;][endmodule][;]</success>
    <attributes>[]</attributes>
  </module_input_list>
  <success>[endmodule][;]</success>
  <attributes>[]</attributes>
</module_stmt>
<module_stmt>
  <try>[endmodule][;]</try>
  <fail/>
</module_stmt>
TODO: put the module together now
<module_stmt>
  <try></try>
  <fail/>
</module_stmt>
-------------------------
Parsing succeeded
-------------------------
module name: simple_in_n_out
    module input: in_1
<module_stmt>
  <try>[module]</try>
  <module_input_list>
    <try>[in_1]</try>
    <success></success>
    <attributes>[]</attributes>
  </module_input_list>
  <fail/>
</module_stmt>
-------------------------
Parsing failed
-------------------------

完整代码:

#define BOOST_SPIRIT_DEBUG
#include "netlist/netlistlexer.h"
namespace verilog {
    using namespace boost::spirit;
    using boost::phoenix::val;
    using boost::spirit::ascii::char_;
    using boost::spirit::ascii::string;

    ///////////////////////////////////////////////////////////////////////////////
    //  Grammar definition
    ///////////////////////////////////////////////////////////////////////////////
    template <typename Iterator, typename Lexer>
    struct verilog_grammar
    : qi::grammar<Iterator, qi::in_state_skipper<Lexer> >
    {
        template <typename TokenDef>
        verilog_grammar(TokenDef const& tok)
        : verilog_grammar::base_type(program)
        {
            using boost::spirit::_val;
            using phoenix::push_back;
            using qi::on_error;
            using qi::fail;
            using phoenix::construct;

            program
                =   +statement
                ;


            statement
                =   module_stmt
                |   end_module_stmt
                ;


            module_input_list %= tok.identifier[push_back(phoenix::ref(module_inputs), _1)] % qi::lit(',');
            module_input_list.name("module_input_list");
            BOOST_SPIRIT_DEBUG_NODE(module_input_list);
            module_stmt
                %=   tok.module_ >> tok.identifier[phoenix::ref(module_name) = _1] 
                >> '(' >> -(module_input_list) >> ')'
                >> ';';
            module_stmt.name("module");
            BOOST_SPIRIT_DEBUG_NODE(module_stmt);
            end_module_stmt
                =   (tok.endmodule_ >> ';' | tok.endmodule_)[
                    std::cout << val("TODO: put the module together now") << "\n"
                ];
            end_module_stmt.name("end_module_stmt");

            on_error<fail>
            (
                program
            , std::cout
                    << val("Error! Expecting ")
                    << _4                               // what failed?
                    << val(" here: \"")
                    << construct<std::string>(_3, _2)   // iterators to error-pos, end
                    << val("\"")
                    << std::endl
            );
        }

        std::string module_name;
        stringvec module_inputs;
        typedef boost::variant<unsigned int, std::string> expression_type;
        typedef boost::fusion::vector<std::string,std::vector<std::string>> fustring;

        qi::rule<Iterator, qi::in_state_skipper<Lexer> > program, statement;
        qi::rule<Iterator, qi::in_state_skipper<Lexer> > module_stmt;
        qi::rule<Iterator, qi::in_state_skipper<Lexer> > module_input_list;
        qi::rule<Iterator, qi::in_state_skipper<Lexer> > end_module_stmt;
    };
} // end verilog namespace

void parse_verilog_file(std::string str){
    typedef std::string::iterator base_iterator_type;
    using namespace boost::spirit;
    typedef lex::lexertl::token<
        base_iterator_type, boost::mpl::vector<unsigned int, std::string>
    > token_type;
     typedef lex::lexertl::lexer<token_type> lexer_type;
     typedef verilog::verilog_tokens<lexer_type> verilog_tokens;
     typedef verilog_tokens::iterator_type iterator_type;
     typedef verilog::verilog_grammar<iterator_type, verilog_tokens::lexer_def> verilog_grammar;
     verilog_tokens tokens;                         // Our lexer
     verilog_grammar calc(tokens);                  // Our parser

     std::string::iterator it = str.begin();
     iterator_type iter = tokens.begin(it, str.end());
     iterator_type end = tokens.end();
     bool r = qi::phrase_parse(iter, end, calc, qi::in_state("WS")[tokens.self]);

     if (r && iter == end)
     {
         std::cout << "-------------------------\n";
         std::cout << "Parsing succeeded\n";
         std::cout << "-------------------------\n";
         std::cout << "module name: " << calc.module_name << "\n";
         for (const std::string i: calc.module_inputs){
             std::cout << "    module input: " << i << "\n";
         }
     }
     else
     {
         std::cout << "-------------------------\n";
         std::cout << "Parsing failed\n";
         std::cout << "-------------------------\n";
     }

}

int main(){
    for (const std::string input: std::vector<std::string> { 
            "module simple_in_n_out();endmodule;",
            "module simple_in_n_out(in_1);endmodule;",
            "module simple_in_n_out(in_1,in_2,in_3);endmodule;",
            })
    {
        parse_verilog_file(input);
    }
    return 0;
}

网表/netlistlexer.h:

#ifndef NETLISTLEXER_H
#define NETLISTLEXER_H
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/variant/recursive_variant.hpp>
#include <boost/foreach.hpp>

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
namespace fusion = boost::fusion;
namespace phoenix = boost::phoenix;
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
typedef std::vector<std::string> stringvec;
namespace verilog {
    using namespace boost::spirit;
    using boost::phoenix::val;
    using boost::spirit::ascii::char_;
    using boost::spirit::ascii::string;

    ///////////////////////////////////////////////////////////////////////////////
    //  Token definition
    ///////////////////////////////////////////////////////////////////////////////
    template <typename Lexer>
    struct verilog_tokens : lex::lexer<Lexer>
    {
        verilog_tokens()
        {
            // define the tokens to match
            identifier = "[a-zA-Z_][a-zA-Z0-9_]*";
            logic_op = "[\\&\\|]";
            constant = "[0-9]+";
            module_ = "module";
            assign_ = "assign";
            endmodule_ = "endmodule";
            wire_ = "wire";
            input_ = "input";
            output_ = "output";
            inout_ = "inout";
            reg_ = "reg";
            begin_ = "begin";
            end_ = "end";
            always_ = "always";
            if_ = "if";
            else_ = "else";
            parameter_ = "parameter";

            // associate the tokens and the token set with the lexer
            this->self = lex::token_def<>('(') | ')' | '{' | '}' | '=' | '[' | ']' | ';' | constant | logic_op;
            this->self += if_ | else_ | begin_ | end_ | always_ | reg_;
            this->self += module_ | endmodule_ | assign_ | wire_ | input_ | output_ | inout_;
            this->self += parameter_;
            this->self += identifier;

            // define the whitespace to ignore (spaces, tabs, newlines and C-style
            // comments)
            this->self("WS")
                =   lex::token_def<>("[ \\t\\n]+")
                |   "\\/\\*[^*]*\\*+([^/*][^*]*\\*+)*\\/"
                |   "\\/\\/[^\\r\\n\\f]*"
                |   "\\(\\*[^*]*\\*\\)"
                ;
        }

        // these tokens have no attribute
        lex::token_def<lex::omit> if_, else_, begin_, end_, endmodule_;

        // these tokens expose the iterator_range of the matched input sequence
        lex::token_def<> always_, reg_;
        lex::token_def<> module_, assign_, wire_, input_, output_, inout_;
        lex::token_def<> parameter_;

        // The following two tokens have an associated attribute type, 'identifier'
        // carries a string (the identifier name) and 'constant' carries the
        // matched integer value.
        //
        // Note: any token attribute type explicitly specified in a token_def<>
        //       declaration needs to be listed during token type definition as
        //       well (see the typedef for the token_type below).
        //
        // The conversion of the matched input to an instance of this type occurs
        // once (on first access), which makes token attributes as efficient as
        // possible. Moreover, token instances are constructed once by the lexer
        // library. From this point on tokens are passed by reference only,
        // avoiding them being copied around.
        lex::token_def<std::string> identifier;
        lex::token_def<unsigned int> constant;
        lex::token_def<std::string> logic_op;
    };
} // end verilog namespace
#endif // NETLISTLEXER_H

标签: boost-spirit-qi

解决方案


好的,我必须打破 Spirit Lex 的迷雾¹和一些怪癖,表明您可能没有使用符合标准的编译器²。

当我这样做时,我注意到实际语法不使用属性传播,而是使用临时语义动作来提取一些信息³。

我已经记录在案,当你找到最佳位置时,我认为 Spirit 在快速原型制作方面大放异彩。基于语义动作的手动 AST 构建并不是 IMO 所在的位置。

作为最后一个微妙的线索,我注意到您“无用地”包含recursive_variant.hpp- 这让我认为您实际上希望将自动属性传播与递归 AST 一起使用?


第一个想法

让我们module_stmt以 为例。让我们使用 AST 类型,而不是“随意地影响”module_name和解析器成员变量:module_inputs

namespace AST {
    using identifiers = stringvec;

    struct module {
        std::string name;
        identifiers inputs;
    };
}

使其适应自动传播:

BOOST_FUSION_ADAPT_STRUCT(AST::module, name, inputs)

并依靠它:

module_input_list = tok.identifier % ',';
module_stmt       
     = tok.module_ >> tok.identifier 
    >> '(' >> -module_input_list >> ')' >> ';'
    >> tok.endmodule_ >> (';' | qi::eoi)
    ;

注意:我必须将module_令牌定义修复为lex::omit

请注意我是如何将endmodule_其纳入规则的,因为这是很自然的事情。任何嵌套(递归)规则(如嵌套)statements都可以自然地去那里并合成为AST::module

规则声明可以是:

qi::rule<Iterator, AST::module(),      Skipper> module_stmt;
qi::rule<Iterator, AST::identifiers(), Skipper> module_input_list;

把它绑在一起

当然,现在顶级规则没有声明属性,所以AST::module魔法合成的实例就消失了。这很不幸,但很容易解决。扩展我们的 AST 类型:

namespace AST {
    using identifiers = stringvec;

    struct module {
        std::string name;
        identifiers inputs;
    };

    using statement = boost::make_recursive_variant<
        module // module_stmt
    >::type;

    using statements = std::vector<statement>;

    struct program {
        statements body;
    };
}

这个相当简单的 Verilog 程序就可以了。我们扩展规则:

qi::rule<Iterator, AST::program(),     Skipper> program;
qi::rule<Iterator, AST::statements(),  Skipper> statements;
qi::rule<Iterator, AST::statement(),   Skipper> statement;
qi::rule<Iterator, AST::module(),      Skipper> module_stmt;
qi::rule<Iterator, AST::identifiers(), Skipper> module_input_list;

您将注意到将规则与其对应的 AST 节点匹配的模式。规则本身不会改变:

program            = statements;
statements         = +statement;
statement          = module_stmt;

module_input_list = tok.identifier % ',';
module_stmt       
     = tok.module_ >> tok.identifier 
    >> '(' >> -module_input_list >> ')' >> ';'
    >> tok.endmodule_ >> (';' | qi::eoi)
    ;

注意:我介绍了statements一致性,它还避免了传播到单元素自适应融合序列的陷阱⁴</p>

现在我们可以将一个AST::program属性传递给解析器调用:

AST::program program;
if (qi::parse(iter, end, calc, program)) {
    for (auto& stmt : program.body) {
        if (auto* module = boost::get<AST::module>(&stmt)) {
            std::cout << "module name: " << module->name << "\n";
            for (std::string const& i : module->inputs) {
                std::cout << "    module input: " << i << "\n";
            }
        }
    }
}

这将打印与预期相同的输出:

Live On Wandbox

-------------------------
module simple_in_n_out();endmodule;
Parsing succeeded
module name: simple_in_n_out
-------------------------
-------------------------
module simple_in_n_out(in_1);endmodule;
Parsing succeeded
module name: simple_in_n_out
    module input: in_1
-------------------------
-------------------------
module simple_in_n_out(in_1,in_2,in_3);endmodule;
Parsing failed
-------------------------

调试失败

取消注释#define BOOST_SPIRIT_DEBUG显示问题所在:

Live On Wandbox

<program>
  <try>[module]</try>
  <statements>
    <try>[module]</try>
    <statement>
      <try>[module]</try>
      <module_stmt>
        <try>[module]</try>
        <module_input_list>
          <try>[in_1]</try>
          <success></success>
          <attributes>[[[i, n, _, 1]]]</attributes>
        </module_input_list>
        <fail/>
      </module_stmt>
      <fail/>
    </statement>
    <fail/>
  </statements>
  <fail/>
</program>

问题不在于任何规则!它与','不匹配。快速浏览一下标记就可以告诉我们原因:没有与逗号匹配的标记...赶紧添加它:

Live On Wandbox

-------------------------
module simple_in_n_out();endmodule;
Parsing succeeded
module name: simple_in_n_out
-------------------------
-------------------------
module simple_in_n_out(in_1);endmodule;
Parsing succeeded
module name: simple_in_n_out
    module input: in_1
-------------------------
-------------------------
module simple_in_n_out(in_1,in_2,in_3);endmodule;
Parsing succeeded
module name: simple_in_n_out
    module input: in_1
    module input: in_2
    module input: in_3
-------------------------

奖金

但是,由于词法分析器,这个“问题”有点突出了另一个成本因素(请注意module_我之前写过的令牌的另一个问题)。所以这是没有 Lex 开销,没有 Phoenix 开销,在一小部分代码中,完整的 AST 传播的全部内容:

Live On Wandbox

// #define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip> // std::quoted
namespace qi = boost::spirit::qi;

namespace AST {
    using identifier = std::string;
    using identifiers = std::vector<identifier>;

    struct module {
        identifier name;
        identifiers inputs;
    };

    using statement = boost::variant<module>;
    using statements = std::vector<statement>;

    struct program {
        statements body;
    };
}

BOOST_FUSION_ADAPT_STRUCT(AST::module, name, inputs)
BOOST_FUSION_ADAPT_STRUCT(AST::program, body)

namespace verilog {
    template <typename Iterator> struct verilog_grammar : qi::grammar<Iterator, AST::program()> {

        verilog_grammar() : verilog_grammar::base_type(start) {
            auto kw = [](auto p) { return qi::copy(qi::lexeme[qi::no_case[p] >> !(qi::alnum|'_') ]); };

            start      = qi::skip(skipper.alias()) [ program ];
            program    = statements > qi::eoi;
            statements = -statement % ';';
            statement  = module_stmt.alias();

            module_input_list = identifier % ',';
            module_stmt       
                 = kw("module") >> identifier 
                >> '(' >> -module_input_list >> ')' >> ';'
                >> kw("endmodule")
                ;

            // lexemes
            identifier = qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z0-9_");

            skipper = qi::char_(" \t\r\n") // added \r for consistency
                | "//" >> *~qi::char_("\r\n\f") 
                | "/*" >> *(qi::char_ - "*/") >> "*/"
                | "(*" >> *(qi::char_ - "*)") >> "*)"
                ;

            BOOST_SPIRIT_DEBUG_NODES((program)(statements)(statement)(module_stmt)(module_input_list)(identifier));
        }

      private:
        using Skipper = qi::rule<Iterator>;

        qi::rule<Iterator, AST::program()> start;
        Skipper skipper;

        qi::rule<Iterator, AST::program(),     Skipper> program;
        qi::rule<Iterator, AST::statements(),  Skipper> statements;
        qi::rule<Iterator, AST::statement(),   Skipper> statement;
        qi::rule<Iterator, AST::module(),      Skipper> module_stmt;
        qi::rule<Iterator, AST::identifiers(), Skipper> module_input_list;

        // lexemes (formerly "tokens")
        qi::rule<Iterator, AST::identifier()> identifier;
    };
} // end verilog namespace

AST::program parse_verilog_file(std::string const& str) {
    typedef std::string::const_iterator iterator;
    static const verilog::verilog_grammar<iterator> grammar; // Our parser, now stateless

    try {
        AST::program program;
        parse(str.begin(), str.end(), grammar, program);
        return program;
    } catch(qi::expectation_failure<iterator> const& ef) {
        std::ostringstream msg;
        msg << "Parsing failed: expected " << ef.what_ << " at " << std::quoted(std::string(ef.first, ef.last));
        throw std::runtime_error(msg.str());
    }
}

int main() {
    for (const std::string input : std::vector<std::string>{
             "module simple_in_n_out();endmodule;",
             "module simple_in_n_out(in_1);endmodule;",
             "module simple_in_n_out(in_1,in_2,in_3);endmodule;",
             "module a();endmodule",
             "module a();endmodule;oops",
         })
    try {
        std::cout << "-------------------------\n";
        std::cout << std::quoted(input) << "\n";

        for (auto const& stmt : parse_verilog_file(input).body) {
            if (auto* module = boost::get<AST::module>(&stmt)) {
                std::cout << "module name: " << module->name << "\n";
                for (std::string const& i : module->inputs) {
                    std::cout << "    module input: " << i << "\n";
                }
            }
        }
    } catch(std::exception const& e) {
        std::cout << e.what() << '\n';
    }
}

印刷

-------------------------
"module simple_in_n_out();endmodule;"
module name: simple_in_n_out
-------------------------
"module simple_in_n_out(in_1);endmodule;"
module name: simple_in_n_out
    module input: in_1
-------------------------
"module simple_in_n_out(in_1,in_2,in_3);endmodule;"
module name: simple_in_n_out
    module input: in_1
    module input: in_2
    module input: in_3
-------------------------
"module a();endmodule"
module name: a
-------------------------
"module a();endmodule;oops"
Parsing failed: expected <eoi> at "oops"

显着改进:

  • 正确解析“关键字边界”(这就是kw()助手的目的)。这意味着,如果您的标识符以可能是关键字的内容开头,它不会被错误地标记为该关键字(原始的基于 Lex 的方法会发生这种情况)
  • 关键字通常不区分大小写 ( qi::no_case[]) - 只是为了演示
  • 船长更容易指定,同时可读
  • 在这个答案中,我已经在基于 Lex 的版本中做了一些事情:skipper 现在被封装在语法中。我认为,只有在用户可能确实需要更改船长时,船长才应该由用户提供。在 99% 的情况下,skipper 与解析器紧密耦合,使用错误的解析器无论如何都会破坏语法。

    作为奖励,调用变得更加清晰:

    AST::program program;
    parse(str.begin(), str.end(), grammar, program);
    return program;
  • 结合该通知,我如何将parse_verilog_file函数简化为......成为一个函数(返回结果),分离产生和处理结果
    AST::program parse_verilog_file(std::string const& str) {
        typedef std::string::const_iterator iterator;
        static const verilog::verilog_grammar<iterator> grammar; // Our parser, now stateless

        try {
            AST::program program;
            parse(str.begin(), str.end(), grammar, program);
            return program;
        } catch(qi::expectation_failure<iterator> const& ef) {
            std::ostringstream msg;
            msg << "Parsing failed: expected " << ef.what_ << " at " << std::quoted(std::string(ef.first, ef.last));
            throw std::runtime_error(msg.str());
        }
    }
  • 这反过来通过捕获异常显示了简化的错误处理

  • 反过来,我用iter!=end另一个期望点来替换支票:

    program    = statements > qi::eoi;

这结合

    statements = -statement % ';';

使它成为';`` is required between statements, but not at the end of the program (which I _guess_ is what you wanted to convey with the oldendmodule` 规则)

还要注意这-statement % ';'使得空语句是可以接受的。如果这不是您想要的,请放弃'-

请注意,添加的测试用例测试并演示了此逻辑的错误检测/报告("module a();endmodule;oops"结果Parsing failed: expected <eoi> at "oops"

  • 任何像“标识符”这样的“标记”现在都是“词位”规则,因为它们不服从船长⁵ 调试支持现在无缝地以您期望的方式包含这些标记:Live On Wandbox

    <module_input_list>
      <try>in_1,in_2,in_3);endm</try>
      <identifier>
        <try>in_1,in_2,in_3);endm</try>
        <success>,in_2,in_3);endmodul</success>
        <attributes>[[i, n, _, 1]]</attributes>
      </identifier>
      <identifier>
        <try>in_2,in_3);endmodule</try>
        <success>,in_3);endmodule;oop</success>
        <attributes>[[i, n, _, 2]]</attributes>
      </identifier>
      <identifier>
        <try>in_3);endmodule;oops</try>
        <success>);endmodule;oops</success>
        <attributes>[[i, n, _, 3]]</attributes>
      </identifier>
      <success>);endmodule;oops</success>
      <attributes>[[[i, n, _, 1], [i, n, _, 2], [i, n, _, 3]]]</attributes>
    </module_input_list>
    
  • 哦,代码明显更短,同时做的更多:从 211 行代码减少到 112 行代码 (-47%)

  • 它的编译速度明显更快(在我的系统上从 19.7 秒下降了 12.1 秒)
  • 呵呵,鉴于目前的特点,还可以进一步简化:This clocks in at 90 LoC. 但是,我会鼓励改进语法的功能,例如在这里

¹ 有趣的是:“没有人再使用它了”。我不是这么说,因为我不知道(),而且我并不孤单。从这个 2017 年的答案

使用 Lex 会使大部分最佳位置消失,因为所有“高级”解析器(如 real_parser、[u]int_parser)都在窗外。Spirit 开发人员记录在案,他们不喜欢使用 Lex。此外,Spirit X3 不再支持 Lex。

² 我猜是 MSVC,不是完全最新的?主要的罪魁祸首是名称不明确,因为您使用using namespace.

³ Boost Spirit:“语义行为是邪恶的”?

⁴ 在 SO 上查看许多答案:https ://stackoverflow.com/search?q=user%3A85371+spirit+single-element

⁵ 请参阅Boost spirit skipper issues以了解我对船长、规则声明和词位如何交互的首选描述


推荐阅读