首页 > 解决方案 > 如何在 jq 的“join”功能中使用非显示字符,如换行符 (\n) 和制表符 (\t)

问题描述

我在互联网上的任何地方都找不到这个,所以我想我会把它添加为文档。

我想在非显示字符\30(“RecordSeparator”)周围加入一个 json 数组,这样我就可以在 bash 中安全地迭代它,但我不知道该怎么做。我尝试echo '["one","two","three"]' | jq 'join("\30")'了几种排列方式,但没有奏效。

原来解决方案很简单......(见答案)

标签: jsonbashvariablesjq

解决方案


用于jq -j消除记录之间的文字换行符并仅使用您自己的分隔符。这适用于您的简单情况:

#!/usr/bin/env bash
data='["one","two","three"]'
sep=$'\x1e' # works only for non-NUL characters, see NUL version below
while IFS= read -r -d "$sep" rec || [[ $rec ]]; do
  printf 'Record: %q\n' "$rec"
done < <(jq -j --arg sep "$sep" 'join($sep)' <<<"$data")

...but it also works in a more interesting scenario where naive answers fail:

#!/usr/bin/env bash
data='["two\nlines","*"]'
while IFS= read -r -d $'\x1e' rec || [[ $rec ]]; do
  printf 'Record: %q\n' "$rec"
done < <(jq -j 'join("\u001e")' <<<"$data")

returns (when run on Cygwin, hence the CRLF):

Record: $'two\r\nlines'
Record: \*

That said, if using this in anger, I would suggest using NUL delimiters, and filtering them out from the input values:

#!/usr/bin/env bash
data='["two\nlines","three\ttab-separated\twords","*","nul\u0000here"]'
while IFS= read -r -d '' rec || [[ $rec ]]; do
  printf 'Record: %q\n' "$rec"
done < <(jq -j '[.[] | gsub("\u0000"; "@NUL@")] | join("\u0000")' <<<"$data")

NUL is a good choice because it's a character than can't be stored in C strings (like the ones bash uses) at all, so there's no loss in the range of data which can be faithfully conveyed when they're excised -- if they did make it through to the shell, it would (depending on version) either discard them, or truncate the string at the point when one first appears.


推荐阅读