首页 > 解决方案 > 如何提取与正则表达式匹配的bash数组元素

问题描述

在 bash 脚本中,可以使用数组作为数据集,其中每个元素都是一条记录,各个记录包含字段。例如,一条记录可以对应于一只狗,字段可以包括“dogbreed”(狗的假定祖先)和“dogfood”(狗喜欢吃的东西)。

假设我们想知道我们数组中的比特犬喜欢吃什么。至少有两种方法可以做到这一点。我们可以遍历元素并对每个元素应用一个正则表达式。或者我们可以使用 printf 将数组转换为单个多行字符串,然后对字符串进行正则表达式搜索。以下脚本演示了 printf 方法比循环方法快几倍。

有没有更好的方法在 bash 中执行此任务?我的电脑使用 BSD unix with GNU bash, version 3.2.57,所以关联数组不可用。

#!/bin/bash  
# bash array: (1) extract elements where field1 matches a regex; (2) from only these elements, extract the value of field2.
# The fields in an array are announced by ":fieldname:" without the quotes and can appear anywhere in the element. 
# Compare speed of competing methods; confirm that printf is much faster than explicit bash loop
# In this example, 
# (1) we select all elements for which dogbreed equals pitbull; 
# (2) then only from these elements do we extract (print) the dogfood field. If there is no dogfood field in an element, we print nothing.
# Note that if an element merely contains the string "pitbull", this does not imply that it is selected.  
# Limitation/assumption: each field appears no more than once in a record.
# Limitation/assumption: no array element contains newline character / linefeed / LF.  

if [[ $1 = 0 ]]
then
    method=printf
elif [[ $1 = 1 ]]
then
    method=loop
else
    printf '%s\n' "\"\$1\"=\"$1\" must be either 0 for printf or 1 for loop."
    exit 1
fi

unset longarray
declare -a longarray
longarray+=( "dog" )
for index in {1..6} # BEWARE! replacing 6 with a much larger number could define an array with more elements than exist in the universe, thereby overloading the most powerful supercomputer.
    do 
        longarray+=( "${longarray[@]}" )
    done
longarray+=( 
"pitbull licked my face :dogfood: human sweat " 
":dogbreed:chihuahua : licked my face :dogfood: human sweat " 
" and so they said :dogfood: Wolf's Tooth :dogdefault: well-trained :dogbreed: pitbull and what's ever"
"horse flea doggy :dogbreed: pitbull :dogtype:friendly :dogdefault:abused piggy wig" )
longarray=( "shark, great white; :dogbreed:    pitbull :dogfood:bad people :nothing at all" "${longarray[@]}" )
longarray=( 
"${longarray[@]}" 
":dogfood: Wolf's Tooth :nothing shark, great white; :dogbreed:pitbull"

":dogfood: Absolutely no chocolate!!  :dogbreed:   pitbull shark, great white; :dogbreed:pitbull"

"great white shark :dogbreed:pitbull"
)
{
    printf 'length of array:%s\n' "${#longarray[@]}"
    declare -p method
}>/dev/stderr

time {
if [[ $method = printf ]] 
then
    :
    perl -n -e 'use strict; use warnings; use 5.18.4; if (s/.*:dogfood:\s*([^:]+).*/$1/) { print; };' <( perl -n -e 'use strict; use warnings; use 5.18.4; if (m/.*:dogbreed:\s*pitbull\b/)  { print;};' <( printf '%s\n' "${longarray[@]}" ) )
elif [[ $method = loop ]] 
then
    for ephemeral in "${longarray[@]}"
    do
        perl -n -e 'use strict; use warnings; use 5.18.4; if (s/.*:dogfood:\s*([^:]+).*/$1/) { print; };' <( perl -n -e 'use strict; use warnings; use 5.18.4; if (m/.*:dogbreed:\s*pitbull\b/)  { print;};'  <( printf '%s\n' "$ephemeral" ) )
    done
else
    declare -p method
    printf '%s\n' "must be either printf or loop."
    exit 1
fi
}

标签: arraysbashdataset

解决方案


subshel grep​​l 一次性抓取相关元素?

$: ray=( $( seq 1 5000 ) )
$: echo ${#ray[@]} 
5000
# pipe array elements through grep for string you want
$: subset=( $( printf '%s\n' "${ray[@]}" | grep '123' ) )
$: echo ${#subset[@]}
15
$: echo "${subset[@]}"
123 1123 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 2123 3123 4123

对于更具体的匹配,您可能需要分层条件。在这种情况下,我通常使用sed,因为您可以在 `/.../{ ... }' 大括号块结构中嵌入子条件,尽管这里可能不需要它。

尝试 -

printf "%s\n" "${longarray[@]}" | 
  sed -En '/\s*:{0,1}\bdogbreed:{0,1}\s*\bpitbull\b/p'

这将匹配:

shark, great white; :dogbreed:    pitbull :dogfood:bad people :nothing at all
and so they said :dogfood: Wolf's Tooth :dogdefault: well-trained :dogbreed: pitbull and what's ever
horse flea doggy :dogbreed: pitbull :dogtype:friendly :dogdefault:abused piggy wig

还匹配:

dogbreed pitbull
dogbreed: pitbull
dogbreed      pitbull
dogbreed:      pitbull
dogbreed:pitbull
:dogbreed pitbull
:dogbreed: pitbull
dogbreed pitbull:
!dogbreed pitbull!

但不是:

dogbreeds pitbull
dogbreed pitbulls
mydogbreed pitbull
dogbreed::pitbull
dogbreed :pitbull

规格很重要。


推荐阅读