c++ - boost Spirit x3 语法中匹配的规则不正确
问题描述
我是一个使用 Spirit 的新手。
我正在尝试使用 Spirit x3 从一个简单的“excel”公式构建一个 AST 树。语法支持典型的运算符(+、-、*、/)、函数(myfunc(myparam1, myparam2))和单元格引用(例如A1、AA234)。
因此,要解析的示例表达式可能是 A1 + sin(A2+3)。
问题是下面的 xlreference 规则永远不会匹配,因为 xlfunction 规则优先并且该规则不会回溯。我已经尝试过期望,但我缺乏一些好的例子来实现它。
我想这会引出另一个问题,即调试 x3 的最佳方法是什么。我已经看到了 BOOST_SPIRIT_X3_DEBUG 的定义,但我找不到任何例子来证明它的用法。我还在 expression_class 上写了一个 on_error 方法,但这并不能提供良好的跟踪。我曾尝试使用位置标记和 with 语句,但这也没有提供足够的信息。
任何帮助,将不胜感激!
x3::rule<class xlreference, ast::xlreference> const xlreference{"xlreference"};
auto const xlreference_def = +alpha > x3::uint_ > !x3::expect[char('(')];
BOOST_SPIRIT_DEFINE(xlreference);
struct identifier_class;
typedef x3::rule<identifier_class, std::string> identifier_type;
identifier_type const identifier = "identifier";
auto const identifier_def = x3::char_("a-zA-Z") > *(x3::char_("a-zA-Z") | x3::char_('_')) > !x3::expect[char('0-9')];
BOOST_SPIRIT_DEFINE(identifier);
auto const expression_def = // constadditive_expr_def
term [print_action()]
>> *( (char_('+') > term)
| (char_('-') > term)
)
;
x3::rule<xlfunction_class, ast::xlfunction> const xlfunction("xlfunction");
auto const xlfunction_def = identifier > '(' > *(expression > *(',' > expression)) > ')';
BOOST_SPIRIT_DEFINE(xlfunction);
auto const term_def = //constmultiplicative_expr_def
factor
>> *( (char_('*') > factor)
| (char_('/') > factor)
)
;
auto const factor_def = // constunary_expr_def
xlfunction [print_action()]
| '(' > expression > ')'
| (char_('-') > factor)
| (char_('+') > factor)
| x3::double_ [print_action()] | xlreference [print_action()]
;
错误处理程序:
struct expression_class //: x3::annotate_on_success
{
// Our error handler
template <typename Iterator, typename Exception, typename Context>
x3::error_handler_result
on_error(Iterator& q, Iterator const& last, Exception const& x, Context const& context)
{
std::cout
<< "Error! Expecting: "
<< x.which()
<< " here: \""
<< std::string(x.where(), last)
<< "\""
<< std::endl
;
return x3::error_handler_result::fail;
}
};
职位标签:
with<position_cache_tag>(std::ref(positions))
[
client::calculator_grammar::expression
];
client::ast::program ast;
bool r = phrase_parse(iter, (iterator_type const ) str.end(), parser, x3::space, ast);
if (!r) {
std::cout << "failed:" << str << "\n";
}
解决方案
好的,一步一步地进行审查。一路编造 AST 类型(因为你不想展示)。
这是无效代码:
char('0-9')
。那是宽字符文字吗?启用编译器警告!您可能的意思是x3::char_("0-9")
(两个重要的区别!)。!x3::expect[]
是矛盾的。您永远不能通过该条件,因为!
断言前瞻不匹配,而expect[]
需要匹配。因此,最佳情况!
失败,因为expect[]
-ation 匹配。最坏的情况expect[]
会引发异常,因为您要求它。operator >
已经是一个期望点了。出于与以前相同的原因,这> !p
是一个矛盾。做了>> !p
替换
char_("0-9")
为x3::digit
替换
char_("a-zA-Z")
为x3::alpha
一些(许多)规则需要是词位。那是因为您在船长上下文(
phrase_parse
withx3::space
)中调用语法。您的标识符会默默地吃掉空格,因为您没有使它们成为词素。请参阅Boost Spirit 船长问题负前瞻断言不会暴露属性,所以
! char_('(')
可以(应该?)! lit('(')
语义动作的存在(默认情况下)抑制属性传播 - 因此
print_action()
将导致属性传播停止根据定义,期望点 (
operator>
) 不能回溯。这就是使他们成为期望点的原因。使用 kleene-star 组成的 List 运算符:
p >> *(',' >> p)
->p % ','
那个额外的 kleene-star 是假的。你的意思是让参数列表是可选的吗?那是
-(expression % ',')
链式操作符规则使得获取 ast 有点麻烦
简化
factor >> *((x3::char_('*') > factor) // | (x3::char_('/') > factor));
只是
factor >> *(x3::char_("*/") >> factor);
factor_def
逻辑上匹配什么expression
?
第一次审查通过产生:
auto const xlreference_def =
x3::lexeme[+x3::alpha >> x3::uint_] >> !x3::char_('(');
auto const identifier_def =
x3::raw[x3::lexeme[x3::alpha >> *(x3::alpha | '_') >> !x3::digit]];
auto const xlfunction_def = identifier >> '(' >> -(expression % ',') >> ')';
auto const term_def = factor >> *(x3::char_("*/") >> factor);
auto const factor_def = xlfunction //
| '(' >> expression >> ')' //
| x3::double_ //
| xlreference;
auto const expression_def = term >> *(x3::char_("-+") >> term);
更多观察:
(iterator_type const)str.end()
?? 永远不要使用 C 风格的强制转换。事实上,不管怎样,只要使用str.cend()
或确实str.end()
是str
合适的const
。phrase_parse
- 考虑不要让船长成为呼叫者的决定,因为它在逻辑上是你语法的一部分多种excel表达式不解析:A:A、$A4、B$4、所有单元格范围;我想很多时候R1C1也是支持的
魔法仙尘的时间
因此,凭借丰富的经验,我将使用 Crystall Ball™ 一些 AST®:
namespace client::ast {
using identifier = std::string;
//using identifier = boost::iterator_range<std::string::const_iterator>;
struct string_literal : std::string {
using std::string::string;
using std::string::operator=;
friend std::ostream& operator<<(std::ostream& os, string_literal const& sl) {
return os << std::quoted(sl) ;
}
};
struct xlreference {
std::string colname;
size_t rownum;
};
struct xlfunction; // fwd
struct binary_op; // fwd
using expression = boost::variant< //
double, //
string_literal, //
identifier, //
xlreference, //
boost::recursive_wrapper<xlfunction>, //
boost::recursive_wrapper<binary_op> //
>;
struct xlfunction{
identifier name;
std::vector<expression> args;
friend std::ostream& operator<<(std::ostream& os, xlfunction const& xlf)
{
os << xlf.name << "(";
char const* sep = "";
for (auto& arg : xlf.args)
os << std::exchange(sep, ", ") << arg;
return os;
}
};
struct binary_op {
struct chained_t {
char op;
expression e;
};
expression lhs;
std::vector<chained_t> chained;
friend std::ostream& operator<<(std::ostream& os, binary_op const& bop)
{
os << "(" << bop.lhs;
for (auto& rhs : bop.chained)
os << rhs.op << rhs.e;
return os << ")";
}
};
using program = expression;
using boost::fusion::operator<<;
} // namespace client::ast
我们及时调整:
BOOST_FUSION_ADAPT_STRUCT(client::ast::xlfunction, name, args)
BOOST_FUSION_ADAPT_STRUCT(client::ast::xlreference, colname, rownum)
BOOST_FUSION_ADAPT_STRUCT(client::ast::binary_op, lhs, chained)
BOOST_FUSION_ADAPT_STRUCT(client::ast::binary_op::chained_t, op, e)
接下来,让我们声明合适的规则:
x3::rule<struct identifier_class, ast::identifier> const identifier{"identifier"};
x3::rule<struct xlreference, ast::xlreference> const xlreference{"xlreference"};
x3::rule<struct xlfunction_class, ast::xlfunction> const xlfunction{"xlfunction"};
x3::rule<struct factor_class, ast::expression> const factor{"factor"};
x3::rule<struct expression_class, ast::binary_op> const expression{"expression"};
x3::rule<struct term_class, ast::binary_op> const term{"term"};
其中需要定义:
auto const xlreference_def =
x3::lexeme[+x3::alpha >> x3::uint_] /*>> !x3::char_('(')*/;
请注意,前瞻断言 (
!
) 实际上并没有改变解析结果,因为任何单元格引用都不是有效的标识符,所以 () 将保持未解析。
auto const identifier_def =
x3::raw[x3::lexeme[x3::alpha >> *(x3::alpha | '_') /*>> !x3::digit*/]];
同样在这里。稍后将检查剩余的输入
x3::eoi
。
我输入了一个字符串文字,因为任何 Excel 克隆都会有一个:
auto const string_literal =
x3::rule<struct _, ast::string_literal>{"string_literal"} //
= x3::lexeme['"' > *('\\' >> x3::char_ | ~x3::char_('"')) > '"'];
请注意,这表明非递归、本地定义的规则不需要单独的定义。
然后是表达式规则
auto const factor_def = //
xlfunction //
| '(' >> expression >> ')' //
| x3::double_ //
| string_literal //
| xlreference //
| identifier //
;
我通常将其称为“简单表达式”而不是因子。
auto const term_def = factor >> *(x3::char_("*/") >> factor);
auto const expression_def = term >> *(x3::char_("-+") >> term);
auto const xlfunction_def = identifier >> '(' >> -(expression % ',') >> ')';
直接映射到 AST。
BOOST_SPIRIT_DEFINE(xlreference)
BOOST_SPIRIT_DEFINE(identifier)
BOOST_SPIRIT_DEFINE(xlfunction)
BOOST_SPIRIT_DEFINE(term)
BOOST_SPIRIT_DEFINE(factor)
BOOST_SPIRIT_DEFINE(expression)
'纳夫说。现在出现了一些货物崇拜 - 未显示和未使用的代码残余,我在这里大多只是接受并忽略:
int main() {
std::vector<int> positions; // TODO
auto parser = x3::with<struct position_cache_tag /*TODO*/> //
(std::ref(positions)) //
[ //
x3::skip(x3::space)[ //
client::calculator_grammar::expression >> x3::eoi //
] //
];
请注意,尽管如此,
x3::eoi
如果未达到输入结束(模跳过),则规则不匹配。
现在,让我们添加一些测试用例!
struct {
std::string category;
std::vector<std::string> cases;
} test_table[] = {
{
"xlreference",
{"A1", "A1111", "AbCdZ9876543", "i9", "i0"},
},
{
"identifier",
{"i", "id", "id_entifier"},
},
{
"number",
{"123", "inf", "-inf", "NaN", ".99e34", "1e-8", "1.e-8", "+9"},
},
{
"binaries",
{ //
"3+4", "3*4", //
"3+4+5", "3*4*5", "3+4*5", "3*4+5", "3*4+5", //
"3+(4+5)", "3*(4*5)", "3+(4*5)", "3*(4+5)", "3*(4+5)", //
"(3+4)+5", "(3*4)*5", "(3+4)*5", "(3*4)+5", "(3*4)+5"},
},
{
"xlfunction",
{
"pi()",
"sin(4)",
R"--(IIF(A1, "Red", "Green"))--",
},
},
{
"invalid",
{
"A9()", // an xlreference may not be followed by ()
"", // you didn't specify
},
},
{
"other",
{
"A-9", // 1-letter identifier and binary operation
"1 + +9", // unary plus accepted in number rule
},
},
{
"question",
{
"myfunc(myparam1, myparam2)",
"A1",
"AA234",
"A1 + sin(A2+3)",
},
},
};
并运行它们:
for (auto& [cat, cases] : test_table) {
for (std::string const& str : cases) {
auto iter = begin(str), last(end(str));
std::cout << std::setw(12) << cat << ": ";
client::ast::program ast;
if (parse(iter, last, parser, ast)) {
std::cout << "parsed: " << ast;
} else {
std::cout << "failed: " << std::quoted(str);
}
if (iter == last) {
std::cout << "\n";
} else {
std::cout << " unparsed: "
<< std::quoted(std::string_view(iter, last)) << "\n";
}
}
}
现场演示
//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/position_tagged.hpp>
#include <boost/spirit/home/x3/support/utility/annotate_on_success.hpp>
#include <iostream>
#include <iomanip>
#include <map>
namespace x3 = boost::spirit::x3;
namespace client::ast {
using identifier = std::string;
//using identifier = boost::iterator_range<std::string::const_iterator>;
struct string_literal : std::string {
using std::string::string;
using std::string::operator=;
friend std::ostream& operator<<(std::ostream& os, string_literal const& sl) {
return os << std::quoted(sl) ;
}
};
struct xlreference {
std::string colname;
size_t rownum;
};
struct xlfunction; // fwd
struct binary_op; // fwd
using expression = boost::variant< //
double, //
string_literal, //
identifier, //
xlreference, //
boost::recursive_wrapper<xlfunction>, //
boost::recursive_wrapper<binary_op> //
>;
struct xlfunction{
identifier name;
std::vector<expression> args;
friend std::ostream& operator<<(std::ostream& os, xlfunction const& xlf)
{
os << xlf.name << "(";
char const* sep = "";
for (auto& arg : xlf.args)
os << std::exchange(sep, ", ") << arg;
return os;
}
};
struct binary_op {
struct chained_t {
char op;
expression e;
};
expression lhs;
std::vector<chained_t> chained;
friend std::ostream& operator<<(std::ostream& os, binary_op const& bop)
{
os << "(" << bop.lhs;
for (auto& rhs : bop.chained)
os << rhs.op << rhs.e;
return os << ")";
}
};
using program = expression;
using boost::fusion::operator<<;
} // namespace client::ast
BOOST_FUSION_ADAPT_STRUCT(client::ast::xlfunction, name, args)
BOOST_FUSION_ADAPT_STRUCT(client::ast::xlreference, colname, rownum)
BOOST_FUSION_ADAPT_STRUCT(client::ast::binary_op, lhs, chained)
BOOST_FUSION_ADAPT_STRUCT(client::ast::binary_op::chained_t, op, e)
namespace client::calculator_grammar {
struct expression_class //: x3::annotate_on_success
{
// Our error handler
template <typename Iterator, typename Exception, typename Context>
x3::error_handler_result on_error(Iterator& q, Iterator const& last,
Exception const& x,
Context const& context)
{
std::cout //
<< "Error! Expecting: " << x.which() //
<< " here: \"" << std::string(x.where(), last) //
<< "\"" << std::endl;
return x3::error_handler_result::fail;
}
};
x3::rule<struct identifier_class, ast::identifier> const identifier{"identifier"};
x3::rule<struct xlreference, ast::xlreference> const xlreference{"xlreference"};
x3::rule<struct xlfunction_class, ast::xlfunction> const xlfunction{"xlfunction"};
x3::rule<struct factor_class, ast::expression> const factor{"factor"};
x3::rule<struct expression_class, ast::binary_op> const expression{"expression"};
x3::rule<struct term_class, ast::binary_op> const term{"term"};
auto const xlreference_def =
x3::lexeme[+x3::alpha >> x3::uint_] /*>> !x3::char_('(')*/;
auto const identifier_def =
x3::raw[x3::lexeme[x3::alpha >> *(x3::alpha | '_') /*>> !x3::digit*/]];
auto const string_literal =
x3::rule<struct _, ast::string_literal>{"string_literal"} //
= x3::lexeme['"' > *('\\' >> x3::char_ | ~x3::char_('"')) > '"'];
auto const factor_def = //
xlfunction //
| '(' >> expression >> ')' //
| x3::double_ //
| string_literal //
| xlreference //
| identifier //
;
auto const term_def = factor >> *(x3::char_("*/") >> factor);
auto const expression_def = term >> *(x3::char_("-+") >> term);
auto const xlfunction_def = identifier >> '(' >> -(expression % ',') >> ')';
BOOST_SPIRIT_DEFINE(xlreference)
BOOST_SPIRIT_DEFINE(identifier)
BOOST_SPIRIT_DEFINE(xlfunction)
BOOST_SPIRIT_DEFINE(term)
BOOST_SPIRIT_DEFINE(factor)
BOOST_SPIRIT_DEFINE(expression)
} // namespace client::calculator_grammar
int main() {
std::vector<int> positions; // TODO
auto parser = x3::with<struct position_cache_tag /*TODO*/> //
(std::ref(positions)) //
[ //
x3::skip(x3::space)[ //
client::calculator_grammar::expression >> x3::eoi //
] //
];
struct {
std::string category;
std::vector<std::string> cases;
} test_table[] = {
{
"xlreference",
{"A1", "A1111", "AbCdZ9876543", "i9", "i0"},
},
{
"identifier",
{"i", "id", "id_entifier"},
},
{
"number",
{"123", "inf", "-inf", "NaN", ".99e34", "1e-8", "1.e-8", "+9"},
},
{
"binaries",
{ //
"3+4", "3*4", //
"3+4+5", "3*4*5", "3+4*5", "3*4+5", "3*4+5", //
"3+(4+5)", "3*(4*5)", "3+(4*5)", "3*(4+5)", "3*(4+5)", //
"(3+4)+5", "(3*4)*5", "(3+4)*5", "(3*4)+5", "(3*4)+5"},
},
{
"xlfunction",
{
"pi()",
"sin(4)",
R"--(IIF(A1, "Red", "Green"))--",
},
},
{
"invalid",
{
"A9()", // an xlreference may not be followed by ()
"", // you didn't specify
},
},
{
"other",
{
"A-9", // 1-letter identifier and binary operation
"1 + +9", // unary plus accepted in number rule
},
},
{
"question",
{
"myfunc(myparam1, myparam2)",
"A1",
"AA234",
"A1 + sin(A2+3)",
},
},
};
for (auto& [cat, cases] : test_table) {
for (std::string const& str : cases) {
auto iter = begin(str), last(end(str));
std::cout << std::setw(12) << cat << ": ";
client::ast::program ast;
if (parse(iter, last, parser, ast)) {
std::cout << "parsed: " << ast;
} else {
std::cout << "failed: " << std::quoted(str);
}
if (iter == last) {
std::cout << "\n";
} else {
std::cout << " unparsed: "
<< std::quoted(std::string_view(iter, last)) << "\n";
}
}
}
}
印刷
xlreference: parsed: (((A 1)))
xlreference: parsed: (((A 1111)))
xlreference: parsed: (((AbCdZ 9876543)))
xlreference: parsed: (((i 9)))
xlreference: parsed: (((i 0)))
identifier: parsed: ((i))
identifier: parsed: ((id))
identifier: parsed: ((id_entifier))
number: parsed: ((123))
number: parsed: ((inf))
number: parsed: ((-inf))
number: parsed: ((nan))
number: parsed: ((9.9e+33))
number: parsed: ((1e-08))
number: parsed: ((1e-08))
number: parsed: ((9))
binaries: parsed: ((3)+(4))
binaries: parsed: ((3*4))
binaries: parsed: ((3)+(4)+(5))
binaries: parsed: ((3*4*5))
binaries: parsed: ((3)+(4*5))
binaries: parsed: ((3*4)+(5))
binaries: parsed: ((3*4)+(5))
binaries: parsed: ((3)+(((4)+(5))))
binaries: parsed: ((3*((4*5))))
binaries: parsed: ((3)+(((4*5))))
binaries: parsed: ((3*((4)+(5))))
binaries: parsed: ((3*((4)+(5))))
binaries: parsed: ((((3)+(4)))+(5))
binaries: parsed: ((((3*4))*5))
binaries: parsed: ((((3)+(4))*5))
binaries: parsed: ((((3*4)))+(5))
binaries: parsed: ((((3*4)))+(5))
xlfunction: parsed: ((pi())
xlfunction: parsed: ((sin(((4))))
xlfunction: parsed: ((IIF((((A 1))), (("Red")), (("Green"))))
invalid: failed: "A9()" unparsed: "A9()"
invalid: failed: ""
other: parsed: ((A)-(9))
other: parsed: ((1)+(9))
question: parsed: ((myfunc((((myparam 1))), (((myparam 2)))))
question: parsed: (((A 1)))
question: parsed: (((AA 234)))
question: parsed: (((A 1))+(sin((((A 2))+(3))))
只有两failed
行符合预期
调试?
简单地取消注释
#define BOOST_SPIRIT_X3_DEBUG
并被额外的噪音猛烈抨击:
question: <expression>
<try>A1 + sin(A2+3)</try>
<term>
<try>A1 + sin(A2+3)</try>
<factor>
<try>A1 + sin(A2+3)</try>
<xlfunction>
<try>A1 + sin(A2+3)</try>
<identifier>
<try>A1 + sin(A2+3)</try>
<success>1 + sin(A2+3)</success>
<attributes>[A]</attributes>
</identifier>
<fail/>
</xlfunction>
<string_literal>
<try>A1 + sin(A2+3)</try>
<fail/>
</string_literal>
<xlreference>
<try>A1 + sin(A2+3)</try>
<success> + sin(A2+3)</success>
<attributes>[[A], 1]</attributes>
</xlreference>
<success> + sin(A2+3)</success>
<attributes>[[A], 1]</attributes>
</factor>
<success> + sin(A2+3)</success>
<attributes>[[[A], 1], []]</attributes>
</term>
<term>
<try> sin(A2+3)</try>
<factor>
<try> sin(A2+3)</try>
<xlfunction>
<try> sin(A2+3)</try>
<identifier>
<try> sin(A2+3)</try>
<success>(A2+3)</success>
<attributes>[s, i, n]</attributes>
</identifier>
<expression>
<try>A2+3)</try>
<term>
<try>A2+3)</try>
<factor>
<try>A2+3)</try>
<xlfunction>
<try>A2+3)</try>
<identifier>
<try>A2+3)</try>
<success>2+3)</success>
<attributes>[A]</attributes>
</identifier>
<fail/>
</xlfunction>
<string_literal>
<try>A2+3)</try>
<fail/>
</string_literal>
<xlreference>
<try>A2+3)</try>
<success>+3)</success>
<attributes>[[A], 2]</attributes>
</xlreference>
<success>+3)</success>
<attributes>[[A], 2]</attributes>
</factor>
<success>+3)</success>
<attributes>[[[A], 2], []]</attributes>
</term>
<term>
<try>3)</try>
<factor>
<try>3)</try>
<xlfunction>
<try>3)</try>
<identifier>
<try>3)</try>
<fail/>
</identifier>
<fail/>
</xlfunction>
<success>)</success>
<attributes>3</attributes>
</factor>
<success>)</success>
<attributes>[3, []]</attributes>
</term>
<success>)</success>
<attributes>[[[[A], 2], []], [[+, [3, []]]]]</attributes>
</expression>
<success></success>
<attributes>[[s, i, n], [[[[[A], 2], []], [[+, [3, []]]]]]]</attributes>
</xlfunction>
<success></success>
<attributes>[[s, i, n], [[[[[A], 2], []], [[+, [3, []]]]]]]</attributes>
</factor>
<success></success>
<attributes>[[[s, i, n], [[[[[A], 2], []], [[+, [3, []]]]]]], []]</attributes>
</term>
<success></success>
<attributes>[[[[A], 1], []], [[+, [[[s, i, n], [[[[[A], 2], []], [[+, [3, []]]]]]], []]]]]</attributes>
</expression>
parsed: (((A 1))+(sin((((A 2))+(3))))
推荐阅读
- javascript - 在饼图中合并\合并相同的标签
- python - 在 discord.py 中修改时间戳
- bash - 使用 Windows 快捷方式从 wsl ubuntu 运行 jupyterlab
- json - 如何通过传递键读取 JSON 值?
- python - 嵌套子 FSM 在某些状态下阻止父转换
- python - 在 Python matplotlib 中,为什么不添加颜色条来改变表面的颜色?
- java - CustomButton 无法转换为 JButton
- sockets - 快速可靠地搜索所有系统 IPv4 和 IPv6 的 Windows 网络?
- django - Django 按问题过滤答案 - group by with 返回值列表
- paperjs - Paper.js 之字形路径