首页 > 解决方案 > 嵌套多个条件的bash文本解析

问题描述

我有以下代码检查超过 10 个单词的行并将它们拆分到第一个逗号字符出现的位置。它重申了这个过程,因此所有超过 10 个单词和逗号的新拆分行也被拆分(最后没有超过 10 个单词和逗号的行)。

如何编辑此代码以执行以下操作:在完成所有逗号拆分之后(当前代码已经执行的操作),检查结果行是否超过 10 个单词并拆分第一个“和”(带空格)出现?

#!/usr/bin/env bash

input=input.txt
temp=$(mktemp ${input}.XXXX)
trap "rm -f $temp" 0

while awk '
  BEGIN { retval=1 }
  NF >= 10 && /, / {
    sub(/, /, ","ORS)
    retval=0
  }
  1
  END { exit retval }
' "$input" > "$temp"; do
  mv -v $temp $input
done

输入样本:

Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9

Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9 Word10 Word11

Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9 Word10, Word11 Word12 Word13 Word14 Word15 Word16 

Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9 Word10 Word11 and Word12 Word13 Word14 Word15 

Word1 Word2 Word3 Word4 and Word5

期望的输出:

Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9

Word1 Word2 Word3 Word4, 
Word5 Word6 Word7 Word8 Word9 Word10 Word11

Word1 Word2 Word3 Word4,
 Word5 Word6 Word7 Word8 Word9 Word10,
 Word11 Word12 Word13 Word14 Word15 Word16 

Word1 Word2 Word3 Word4, 
Word5 Word6 Word7 Word8 Word9 Word10 Word11 and
 Word12 Word13 Word14 Word15 

Word1 Word2 Word3 Word4 and Word5

先感谢您!

标签: bashparsingtextnestedmultiple-conditions

解决方案


这是你预期的答案吗?

echo "Word1 Word2 Word3 Word4, Word5 Word6 Word7 Word8 Word9 Word10, Word11 Word12 Word13 Word14 Word15 Word16 Word17 Word18 Word19 Word20 Word21 and Word22 Word23 Word24." | grep -oE '[a-zA-Z0-9,.]+' | awk '
BEGIN {
    cnt = 0
}
{
    str = str " " $0
    if ($0 ~ /,$/){
        print str
        cnt = 0
        str = ""
    }
    else if (cnt < 10){
        cnt++
    }
    else {
        print str
        cnt = 0
        str = ""
    }
} END {
    print str
}' | sed 's/^ *//'
Word1 Word2 Word3 Word4,
Word5 Word6 Word7 Word8 Word9 Word10,
Word11 Word12 Word13 Word14 Word15 Word16 Word17 Word18 Word19 Word20 Word21
and Word22 Word23 Word24.

推荐阅读