antlr - ANTLR 规则匹配不带引号或带引号的多行字符串
问题描述
我希望我的语法能够匹配由换行符(\r\n 或 \n)终止的单行字符串赋值,最后可能带有注释,或者匹配由双引号表示的多行赋值。例如:
key = value
key = spaces are allowed
key = until a new line or a comment # this is a comment
key = "you can use quotes as well" # this is a comment
key = "and
with quotes
you can also do
multiline"
那可行吗?我一直在努力解决这个问题,除了多线之外,一切正常。看起来很简单,但规则根本不会适当地匹配。
补充:这只是更大语法的一部分。
解决方案
查看您的示例输入:
# This is the most simple configuration
title = "FML rulez"
# We use ISO notations only, so no local styles
releaseDateTime = 2020-09-12T06:34
# Multiline strings
description = "So,
I'm curious
where this will end."
# Shorcut string; no quotes are needed in a simple property style assignment
# Or if a string is just one word. These strings are trimmed.
protocol = http
# Conditions allow for overriding, best match wins (most conditions)
# If multiple condition sets equally match, the first one will win.
title[env=production] = "One config file to rule them all"
title[env=production & os=osx] = "Even on Mac"
# Lists
hosts = [alpha, beta]
# Hierarchy is implemented using groups denoted by curly brackets
database {
# indenting is allowed and encouraged, but has no semantic meaning
url = jdbc://...
user = "admin"
# Strings support default encryption with a external key file, like maven
password = "FGFGGHDRG#$BRTHT%G%GFGHFH%twercgfg"
# groups can nest
dialect {
database = postgres
}
}
servers {
# This is a table:
# - the first row is a header, containing the id's
# - the remaining rows are values
| name | datacenter | maxSessions | settings |
| alpha | A | 12 | |
| beta | XYZ | 24 | |
| "sys 2" | B | 6 | |
# you can have sub blocks, which are id-less groups (id is the column)
| gamma | C | 12 | {breaker:true, timeout: 15} |
# or you reference to another block
| tango | D | 24 | $environment |
}
# environments can be easily done using conditions
environment[env=development] {
datasource = tst
}
environment[env=production] {
datesource = prd
}
我会去做这样的事情:
grammar TECL;
input_file
: configs EOF
;
configs
: NL* ( config ( NL+ config )* NL* )?
;
config
: property
| group
| table
;
property
: WORD conditions? ASSIGN value
;
group
: WORD conditions? NL* OBRACE configs CBRACE
;
conditions
: OBRACK property ( AMP property )* CBRACK
;
table
: row ( NL+ row )*
;
row
: PIPE ( col_value PIPE )+
;
col_value
: ~( PIPE | NL )*
;
value
: WORD
| VARIABLE
| string
| list
;
string
: STRING
| WORD+
;
list
: OBRACK ( value ( COMMA value )* )? CBRACK
;
ASSIGN : '=';
OBRACK : '[';
CBRACK : ']';
OBRACE : '{';
CBRACE : '}';
COMMA : ',';
PIPE : '|';
AMP : '&';
VARIABLE
: '$' WORD
;
NL
: [\r\n]+
;
STRING
: '"' ( ~[\\"] | '\\' . )* '"'
;
WORD
: ~[ \t\r\n[\]{}=,|&]+
;
COMMENT
: '#' ~[\r\n]* -> skip
;
SPACES
: [ \t]+ -> skip
;
它将解析以下解析树中的示例:
和输入:
key = value
key = spaces are allowed
key = until a new line or a comment # this is a comment
key = "you can use quotes as well" # this is a comment
key = "and
with quotes
you can also do
multiline"
进入以下:
目前:多行引用有效,未引用字符串中的空格无效。
正如您在上面的树中看到的那样,它确实有效。我怀疑您在现有语法中使用了部分语法,但这不起作用。
[...] 我是插入动作的过程吗?
我不会在您的语法中嵌入动作(目标代码):它使您难以阅读,并且对语法进行更改将更难。当然,您的语法仅适用于 1 种语言。最好使用侦听器或访问者而不是这些操作。
祝你好运!
推荐阅读
- google-vision - Google Cloud Vision API 中的条形码读取
- python - 如何从给定的字符串中获取列表中第一次出现的元素?
- django - How to aggregate sum of all data in django restframe work and it should filter by date
- asp.net - 在应用程序/类级别(.Net 4.8)是否有等效于 GetRouteUrl() 的方法?
- windows - Conda 没有为 pyjnius 的 JDK 设置正确的路径
- android - NoClassDefFoundError: com.google.firebase.FirebaseApp 的 Landroid/support/v4/util/ArrayMap 解析失败
- html - 如何使用纯 CSS 实现活动侧边栏项目的向外曲线
- java - 多生产者和消费者多线程 Java 未按预期工作
- django - 如何从views.py调用django中的远程API?
- python - 子单元测试类不会识别父成员