首页 > 解决方案 > 如何根据这两行的单词总和组合两个相邻的行(递归)

问题描述

仅当两行的单词总和(定义为由空格或行尾符号分隔的连续字符的单词)少于 20 个单词时,我才尝试合并两行连续的行。

样本输入:

1This line has five words.
2This line has unfortunately six words.
3This line has also six words.
4The above three lines have a total of 18 words, which is less than 20, and should be combined into one line.
5This line has only 6 words.

期望的输出:

1This line has five words. 2This line has unfortunately six words. 3This line has also six words.
4The above three lines have a total of 18 words, which is less than 20, and should be combined into one line.
5This line has only 6 words.

我有以下代码作为起点,但我不知道如何制作条件,所以它检查两个连续的行。

awk '{while (sum(NF + NF+1) > 20) {sub ("\n", "")}}1'

两个问题是while (sum(NF + NF+1) > 20) ...我如何让它检查两个连续行的总和?第二个问题...由于某种原因sub ("\n", "")不会在行尾删除换行符,即使我在一行中尝试它也是如此。

谢谢。

标签: linuxbashawk

解决方案


awk 逐行读取它的输入,如果不读取它,就无法知道下一行中的字段数(用您的来说是单词)。所以,你的逻辑是行不通的。

以下是实现此目的的简单方法;它只是缓冲行,直到字数达到 20,释放缓冲区内容,然后继续。

awk '(c += NF) < 20 {
  buf = (buf sep $0)
  sep = OFS
  next
}
{
  if (NR > 1)
    print buf
  buf = $0
  c = NF
}
END {
  print buf
}' file

推荐阅读