首页 > 解决方案 > Boost Spirit x3 -- Parameterizing Parsers with other Parsers

问题描述

I don't have a whole lot of code to show for this one because I haven't managed to get anything to work, but the high level problem is that I am trying to create a series of parsers for a family of related languages. What I mean by this is that the languages will share many of the same constructs, but there won't be complete overlap. As a simple example, say I have an AST that is parameterized by some (completely contrived in this example) 'leaf' type:

template <typename t>
struct fooT {
  std::string name;
  t leaf;
};

One language may have t instantiated as int and one as double. What I wanted to do was create a templated class or something that I could instantiate with different t's and corresponding parser rules so that I could generate a series of composed parsers.

In my real example, I have a bunch of nested structures that are the same across the languages, but only have a couple of small variations at the very edges of the AST, so if I cannot compose the parsers in a good way, I will end up duplicating a bunch of parse rules, AST nodes, etc. I have actually gotten it to work by not putting it in a class and just very carefully arranging my header files and imports so that I can have 'dangling' parser rules with special names that can be assembled. A big downside of this is that I cannot include parsers for the multiple different languages within the same program -- precisely because of the name conflict that arises.

Does anybody have any ideas how I could approach this?

标签: c++parsingboostboost-spirit-x3

解决方案


The nice thing about X3 is that you can generate parsers just as easily as you define them in the first place.

E.g.

template <typename T> struct AstNode {
    std::string name;
    T leaf;
};

Now let's define a generic parser maker:

namespace Generic {
    template <typename T> auto leaf = x3::eps(false);

    template <> auto leaf<int>
        = "0x" >> x3::int_parser<uintmax_t, 16>{};
    template <> auto leaf<std::string>
        = x3::lexeme['"' >> *~x3::char_('"') >> '"'];

    auto no_comment = x3::space;
    auto hash_comments = x3::space |
        x3::lexeme['#' >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);
    auto c_style_comments = x3::space |
        "/*" >> x3::lexeme[*(x3::char_ - "*/")] >> "*/";
    auto cxx_style_comments = c_style_comments |
        x3::lexeme["//" >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);

    auto name = leaf<std::string>;

    template <typename T> auto parseNode(auto heading, auto skipper) {
        return x3::skip(skipper)[
            x3::as_parser(heading) >> name >> ":" >> leaf<T>
        ];
    }
}

This allows us to compose various grammars with various leaf types and skipper styles:

namespace Language1 {
    static auto const grammar =
        Generic::parseNode<int>("value", Generic::no_comment);
}

namespace Language2 {
    static auto const grammar =
        Generic::parseNode<std::string>("line", Generic::cxx_style_comments);
}

Let's Demo:

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted.hpp>
#include <iomanip>
namespace x3 = boost::spirit::x3;

template <typename T> struct AstNode {
    std::string name;
    T leaf;
};

BOOST_FUSION_ADAPT_TPL_STRUCT((T), (AstNode)(T), name, leaf)

namespace Generic {
    template <typename T> auto leaf = x3::eps(false);

    template <> auto leaf<int>
        = "0x" >> x3::uint_parser<uintmax_t, 16>{};
    template <> auto leaf<std::string>
        = x3::lexeme['"' >> *~x3::char_('"') >> '"'];

    auto no_comment = x3::space;
    auto hash_comments = x3::space |
        x3::lexeme['#' >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);
    auto c_style_comments = x3::space |
        "/*" >> x3::lexeme[*(x3::char_ - "*/")] >> "*/";
    auto cxx_style_comments = c_style_comments |
        x3::lexeme["//" >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);

    auto name = leaf<std::string>;

    template <typename T> auto parseNode(auto heading, auto skipper) {
        return x3::skip(skipper)[
            x3::as_parser(heading) >> name >> ":" >> leaf<T>
        ];
    }
}

namespace Language1 {
    static auto const grammar =
        Generic::parseNode<int>("value", Generic::no_comment);
}

namespace Language2 {
    static auto const grammar =
        Generic::parseNode<std::string>("line", Generic::cxx_style_comments);
}

void test(auto const& grammar, std::string_view text, auto ast) {
    auto f = text.begin(), l = text.end();
    std::cout << "\nParsing: " << std::quoted(text, '\'') << "\n";
    if (parse(f, l, grammar, ast)) {
        std::cout << " -> {name:" << ast.name << ",value:" << ast.leaf << "}\n";
    } else {
        std::cout << " -- Failed " << std::quoted(text, '\'') << "\n";
    }
}

int main() {
    test(Language1::grammar, R"(value "one": 0x01)", AstNode<int>{});
    test(
        Language2::grammar,
        R"(line "Hamlet": "There is nothing either good or bad, but thinking makes it so.")",
        AstNode<std::string>{});

    test(
        Language2::grammar,
        R"(line // rejected: "Hamlet": "To be ..."
        "King Lear": /*hopefully less trite:*/"As flies to wanton boys are we to the gods")",
        AstNode<std::string>{});
}

Prints

Parsing: 'value "one": 0x01'
 -> {name:one,value:1}

Parsing: 'line "Hamlet": "There is nothing either good or bad, but thinking makes it so."'
 -> {name:Hamlet,value:There is nothing either good or bad, but thinking makes it so.}

Parsing: 'line // rejected: "Hamlet": "To be ..."
        "King Lear": /*hopefully less trite:*/"As flies to wanton boys are we to the gods"'
 -> {name:King Lear,value:As flies to wanton boys are we to the gods}

Advanced

For advanced scenarios (where you have separation of rule declaration and definitions across trnalsation units and/or you require dynamic switching), you can use the x3::any_rule<> holder.


推荐阅读