regex - POSIX sh:用函数查找和替换
问题描述
在 JavaScript 中,您可以执行以下操作:
someComplexProcessing = (wholeMatch, group1, group2, index, mystr)=> replacement...
mystr.replace(/some.* regex(with) multiple (capture groups)/g, someComplexProcessing)
例如。
const renderTemplate = (str, env)=> str.replace(/{{(.*?)}}/g, (_, name)=> env[name])
renderTemplate('{{salut}} {{name}}!', {salut: 'Hi', name: 'Leo'}) // "Hi Leo!"
什么是最好的 POSIX 兼容、通用、变体?
- reusability # eg. a function taking regex, processingFunction, and input, etc - that I could but in my .shellrc/source lib.sh or similar and reuse
- multiline # eg. if "uppercase everything between {{ and }}", `a {{b\nc}}` -> `a B\nC`
- no escape gotchas # eg. it shouldn't break if input, replacement, or regex contains special characters
- POSIX compatible # eg. running it under `docker run --rm -it alpine sh`, etc
- using regex # eg. perl regex seems like the most prominent one, please note differences from it if other is used
meriting:
- no/less dependencies # eg. as portable as possible
- multiple capture groups
- performance
- security # related to no escape gotchas, eg. ok with untrusted input
我找到了一些 bash 的解决方案和一些兼容的边缘案例解决方案,尽管没有一个能完全接近 js 的 .replace 提供的简单性。最终,我想在不考虑实现细节/陷阱的情况下进行编程,并且不引入 100 的 MB(主要用于 alpine 容器,但也使用 ubuntu/OSX),从而尝试建立一个可移植的、与 posix 兼容的库片段、功能和模式。
解决方案
一个无效的输入有些转义(假设 no \r
)(但不是正则表达式输入转义),解决方案,只有一个捕获组(中间)。虽然可移植(仅使用tr
and sed
(和printf
,-z
空字符串检查)。(可能将 sed 部分更改为通常与 perl regex 兼容的内容)
lib.sh:
#!/usr/bin/env sh
multiline_substitute_with_fn () {
sub_start="$1"; shift; fn_name="$1"; shift; sub_end="$1"; shift; left="$(cat)";
# uppercase () { cat | tr 'a-z' 'A-Z'; }; echo 'Hello [there]!' | multiline_substitute_with_fn '\[' uppercase '\]'
# make single-line, sanitize input against _SUB(START|END)_, a\ra {{echo "b\rb"}} c {{echo d}} e
left="$(echo "$left" | tr '\n' '\r' | sed 's/_SUB/_ASUB/g')"
while [ ! -z "$left" ]; do
left="$(echo "$left" | sed "s/$sub_start/_SUBSTART_/")" # a\ra _SUBSTART_echo "b\rb"}} c {{echo d}} e
printf '%s' "$(echo "$left" | sed 's/_SUBSTART_.*//' | sed 's/_ASUB/_SUB/g' | tr '\r' '\n')" # a\na
lefttmp="$(echo "$left" | sed 's/.*_SUBSTART_//' | sed "s/$sub_end/_SUBEND_/")" # echo "b\rb"_SUBEND_ c {{echo d}} e
if [ "$lefttmp" = "$left" ]; then left=''; break; fi
left="$lefttmp"
middle="$(echo "$left" | sed 's/_SUBEND_.*//' | tr '\r' '\n')" # echo "b\nb"
[ ! -z "$middle" ] && printf '%s' "$(echo "$middle" | $fn_name | sed 's/_ASUB/_SUB/g')" # b\nb
left="$(echo "$left" | sed 's/.*_SUBEND_//')" # c {{echo d}} e
done
}
用法:
cat file | multiline_substitute_with_fn 'start regex' processingFunction 'end regex'
例如。用法:
#!/usr/bin/env sh
. ./lib.sh # load lib
uppercase () { cat | tr 'a-z' 'A-Z'; };
echo 'Hello [there]!' | multiline_substitute_with_fn '\[' uppercase '\]'
# -> Hello THERE!
eval_template () { # not "safe" in terms of eval
# echo 'a\na {{echo "b\nb"}} c {{echo d}} e' | eval_template # -> 'a\na b\nb c d e'
# hello=hi; echo '{{=$hello}} there' | eval_template # -> {{echo "$hello"}} there -> 'hi there'
fn () {
middle="$(cat)"
case "$middle" in =*) middle="echo \"${middle#=}\"" ;; *);; esac # '=$a' -> 'echo "$a"'
eval "$middle"
}
cat | multiline_substitute_with_fn '{{' fn '}}'
}
eval_template <<-EOF
a
a {{echo "b
b"}} c {{echo d}} e
EOF
# -> a
# a b
# b c d e'
echo '{{=$salut}} {{=$name}}!' > my.template
salut=Hi; name="Leo Name";
cat my.template | eval_template
# Hi Leo Name!
推荐阅读
- json - 来自外部文件的 Json
- javascript - 对象数组数组 - 查找对象值出现并返回真/假(js)
- openldap - OpenLDAP:是否允许重复的 uidNumber?
- listview - ScrollController 附加到多个滚动视图,同时导航到另一个页面
- python-3.x - DRF 更新查询集中的 Json 字段值
- laravel - 缺少 [Route: reviews.show] 所需的参数(来自控制器)
- python - 带有文本光标的 Pycharm 查询
- apache-flink - StreamExecutionEnvironment 不能用 Apache Flink 中的 Table 元组序列化
- sql - 将表中的每一行与主行进行比较
- reactjs - 将 react 钩子与 redux-saga(或任何其他中间件)结合使用