linux - 如何根据这两行的单词总和组合两个相邻的行(递归)
问题描述
仅当两行的单词总和(定义为由空格或行尾符号分隔的连续字符的单词)少于 20 个单词时,我才尝试合并两行连续的行。
样本输入:
1This line has five words.
2This line has unfortunately six words.
3This line has also six words.
4The above three lines have a total of 18 words, which is less than 20, and should be combined into one line.
5This line has only 6 words.
期望的输出:
1This line has five words. 2This line has unfortunately six words. 3This line has also six words.
4The above three lines have a total of 18 words, which is less than 20, and should be combined into one line.
5This line has only 6 words.
我有以下代码作为起点,但我不知道如何制作条件,所以它检查两个连续的行。
awk '{while (sum(NF + NF+1) > 20) {sub ("\n", "")}}1'
两个问题是while (sum(NF + NF+1) > 20) ...我如何让它检查两个连续行的总和?第二个问题...由于某种原因sub ("\n", "")不会在行尾删除换行符,即使我在一行中尝试它也是如此。
谢谢。
解决方案
awk 逐行读取它的输入,如果不读取它,就无法知道下一行中的字段数(用您的话来说是单词)。所以,你的逻辑是行不通的。
以下是实现此目的的简单方法;它只是缓冲行,直到字数达到 20,释放缓冲区内容,然后继续。
awk '(c += NF) < 20 {
buf = (buf sep $0)
sep = OFS
next
}
{
if (NR > 1)
print buf
buf = $0
c = NF
}
END {
print buf
}' file
推荐阅读
- java - 如何禁止本地主机实例进行尤里卡注册
- jquery - 如何从对象的键创建单个数组
- c# - 标题:在 asp.net 核心中将数据发布为 application/x-www-form-urlencoded
- hyperledger-fabric - 如何通过对等节点 cli 与链代码(使用 composer 安装)进行交互?
- jenkins - 如何从访问 Jenkins 的位置获取系统属性?
- ms-access - 使用 Access 表单文本框从查询中的同一字段过滤多个值/条件
- r - 通过在每组中相交值来减少分组数据框
- regex - Delphi:在我的情况下如何使用 system.RegularExpressions?
- java - 如何在 int 中获得十位数
- mongodb - 如何形成 mongo 查询以从匹配条件的对象数组中获取特定字段