首页 > 解决方案 > 如何使用bash计算文件每一行中元素的频率

问题描述

我有一个看起来像这样的文件:

1|2|3|4
1|2|3|4
1|2|3
1|2
1|2|3|4
1|2|3|4

我想要做的是计算 a|在每一行中出现的频率并打印一条消息,例如:所有行都有这个数量,除了这个有这个其他数量的行。

所需的输出应该是这样的:

The "|" element appears 3 times in each line except in line 3 and 4 where it appears 2 and 1 times 

我是 bash 的新手,所以非常感谢您的帮助!

标签: bashcount

解决方案


使用 awk:

awk -F\| '{ if (map[NF-1]!="") { map[NF-1]=map[NF-1]","NR } else { map[NF-1]=NR } } END { for (i in map) { printf  "lines %s have %s occurances of |\n",map[i],i } }' file

解释:

awk -F\| '{                                                         # Set the field delimiter to |
            if (map[NF-1]!="") { 
                map[NF-1]=map[NF-1]","NR                            # Create an array called map with the number of | occurrences (NF-1) as the index and line number (NR) as the value
            } 
            else { 
                map[NF-1]=NR                                         # We don't want to prefix a comma if this is the first entry in the array
            } 
            map1[NF-1]++
           } 
       END { 
             for (i in map) { 
                printf  "line(s) %s have %s occurrence(s) of |\n",map[i],i # At the end, print the contents of the array in the format required.
             }
             for (i in map1) {
                printf "%s line(s) have %s occurrence(s) of |, ",map1[i],i
             }
            printf "\n"
            }' file

输出:

line(s) 4 have 1 occurrence(s) of |
line(s) 3 have 2 occurrence(s) of |
line(s) 1,2,5,6 have 3 occurrence(s) of |

推荐阅读