arrays - 如何提取与正则表达式匹配的bash数组元素
问题描述
在 bash 脚本中,可以使用数组作为数据集,其中每个元素都是一条记录,各个记录包含字段。例如,一条记录可以对应于一只狗,字段可以包括“dogbreed”(狗的假定祖先)和“dogfood”(狗喜欢吃的东西)。
假设我们想知道我们数组中的比特犬喜欢吃什么。至少有两种方法可以做到这一点。我们可以遍历元素并对每个元素应用一个正则表达式。或者我们可以使用 printf 将数组转换为单个多行字符串,然后对字符串进行正则表达式搜索。以下脚本演示了 printf 方法比循环方法快几倍。
有没有更好的方法在 bash 中执行此任务?我的电脑使用 BSD unix with GNU bash, version 3.2.57
,所以关联数组不可用。
#!/bin/bash
# bash array: (1) extract elements where field1 matches a regex; (2) from only these elements, extract the value of field2.
# The fields in an array are announced by ":fieldname:" without the quotes and can appear anywhere in the element.
# Compare speed of competing methods; confirm that printf is much faster than explicit bash loop
# In this example,
# (1) we select all elements for which dogbreed equals pitbull;
# (2) then only from these elements do we extract (print) the dogfood field. If there is no dogfood field in an element, we print nothing.
# Note that if an element merely contains the string "pitbull", this does not imply that it is selected.
# Limitation/assumption: each field appears no more than once in a record.
# Limitation/assumption: no array element contains newline character / linefeed / LF.
if [[ $1 = 0 ]]
then
method=printf
elif [[ $1 = 1 ]]
then
method=loop
else
printf '%s\n' "\"\$1\"=\"$1\" must be either 0 for printf or 1 for loop."
exit 1
fi
unset longarray
declare -a longarray
longarray+=( "dog" )
for index in {1..6} # BEWARE! replacing 6 with a much larger number could define an array with more elements than exist in the universe, thereby overloading the most powerful supercomputer.
do
longarray+=( "${longarray[@]}" )
done
longarray+=(
"pitbull licked my face :dogfood: human sweat "
":dogbreed:chihuahua : licked my face :dogfood: human sweat "
" and so they said :dogfood: Wolf's Tooth :dogdefault: well-trained :dogbreed: pitbull and what's ever"
"horse flea doggy :dogbreed: pitbull :dogtype:friendly :dogdefault:abused piggy wig" )
longarray=( "shark, great white; :dogbreed: pitbull :dogfood:bad people :nothing at all" "${longarray[@]}" )
longarray=(
"${longarray[@]}"
":dogfood: Wolf's Tooth :nothing shark, great white; :dogbreed:pitbull"
":dogfood: Absolutely no chocolate!! :dogbreed: pitbull shark, great white; :dogbreed:pitbull"
"great white shark :dogbreed:pitbull"
)
{
printf 'length of array:%s\n' "${#longarray[@]}"
declare -p method
}>/dev/stderr
time {
if [[ $method = printf ]]
then
:
perl -n -e 'use strict; use warnings; use 5.18.4; if (s/.*:dogfood:\s*([^:]+).*/$1/) { print; };' <( perl -n -e 'use strict; use warnings; use 5.18.4; if (m/.*:dogbreed:\s*pitbull\b/) { print;};' <( printf '%s\n' "${longarray[@]}" ) )
elif [[ $method = loop ]]
then
for ephemeral in "${longarray[@]}"
do
perl -n -e 'use strict; use warnings; use 5.18.4; if (s/.*:dogfood:\s*([^:]+).*/$1/) { print; };' <( perl -n -e 'use strict; use warnings; use 5.18.4; if (m/.*:dogbreed:\s*pitbull\b/) { print;};' <( printf '%s\n' "$ephemeral" ) )
done
else
declare -p method
printf '%s\n' "must be either printf or loop."
exit 1
fi
}
解决方案
subshel grep
l 一次性抓取相关元素?
$: ray=( $( seq 1 5000 ) )
$: echo ${#ray[@]}
5000
# pipe array elements through grep for string you want
$: subset=( $( printf '%s\n' "${ray[@]}" | grep '123' ) )
$: echo ${#subset[@]}
15
$: echo "${subset[@]}"
123 1123 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 2123 3123 4123
对于更具体的匹配,您可能需要分层条件。在这种情况下,我通常使用sed
,因为您可以在 `/.../{ ... }' 大括号块结构中嵌入子条件,尽管这里可能不需要它。
尝试 -
printf "%s\n" "${longarray[@]}" |
sed -En '/\s*:{0,1}\bdogbreed:{0,1}\s*\bpitbull\b/p'
这将匹配:
shark, great white; :dogbreed: pitbull :dogfood:bad people :nothing at all
and so they said :dogfood: Wolf's Tooth :dogdefault: well-trained :dogbreed: pitbull and what's ever
horse flea doggy :dogbreed: pitbull :dogtype:friendly :dogdefault:abused piggy wig
还匹配:
dogbreed pitbull
dogbreed: pitbull
dogbreed pitbull
dogbreed: pitbull
dogbreed:pitbull
:dogbreed pitbull
:dogbreed: pitbull
dogbreed pitbull:
!dogbreed pitbull!
但不是:
dogbreeds pitbull
dogbreed pitbulls
mydogbreed pitbull
dogbreed::pitbull
dogbreed :pitbull
规格很重要。
推荐阅读
- python - 在 Python 中将嵌套的 Json 文件转换为 CSV 文件
- javascript - 使用 Octokit 设置 webhook
- oracle - 我的触发器没有工作,他的时间有问题
- xml - XSLT,获取最频繁的而不是第一个元素
- javascript - 使用 Sequelize 在数据库中创建条目 - createDailyData 不是函数
- php - 为什么 Nginx 在 PHP 文件中给出 403 错误?
- sql - UNNEST 的订购保证
- tailwind-css - 如何使用 purgeCSS (Tailwindcss) 将属性选择器列入白名单或安全列表?
- python - 为什么在 Python 中扩展嵌套列表也会扩展后续嵌套列表?
- java - 我怎样才能知道类中每个粒子的确切位置?