首页 > 解决方案 > Column manipulating using Bash & Awk

问题描述

Let's assume have an example1.txt file consisting of few rows.

item item item  
 A    B    C      
100  20   2       
100  22   3
100  23   4
101  26   2
102  28   2
103  29   3
103  30   2
103  32   2
104  33   2
104  34   2
104  35   2
104  36   3

There are few commands I would like to perform to filter out the txt files and add a few more columns.

At first, I want to apply a condition when item C is equal to 2. Using awk command I can do that in the following way.

Therefore The return text file would be:

awk '$3 == 2 { print $1 "\t"  $2  "\t" $3} ' example1.txt > example2.txt

item item item
 A    B    C      
100  20   2       
101  26   2
102  28   2
103  30   2
103  32   2
104  33   2
104  34   2
104  35   2

Now I want to count two things:

I want to count the total unique number in column 1.

For example, in the above case example2.txt, it would be:
(100,101,102,103,104) = 5

And I would like to add the repeating column A number and add that to a new column.

I would like to have like this:

item item item  item
 A    B    C     D
100  20   2      1
101  26   2      1
102  28   2      1
103  30   2      2
103  32   2      2
104  33   2      3
104  34   2      3
104  35   2      3

~

Above Item D column (4th), 1st row is 1, because it did not have any repetitive. but in 4th row, it's 2 because 103 is repetitive twice. Therefore I have added 2 in the 4th and 5th columns. Similarly, the last three columns in Item 4 is 3, because item A is repetitive three times in these three columns.

标签: awk

解决方案


请您尝试以下操作。如果要将输出保存到相同的 Input_file 中,请附加> temp && mv temp Input_file到以下代码。

awk '
FNR==NR{
  if($3==2){
    a[$1,$3]++
  }
  next
}
FNR==1{
  $(NF+1)="item"
  print
  next
}
FNR==2{
  $(NF+1)="D"
  print
  next
}
$3!=2{
  next
}
FNR>2{
  $(NF+1)=a[$1,$3]
}
1
' Input_file  Input_file | column -t

输出如下。

item  item  item  item
A     B     C     D
100   20    2     1
101   26    2     1
102   28    2     1
103   30    2     2
103   32    2     2
104   33    2     3
104   34    2     3
104   35    2     3


说明:为上述代码添加详细说明。

awk '                    ##Starting awk program fro here.
FNR==NR{                 ##Checking condition if FNR==NR which will  be TRUE when 1st time Input_file is being read.
  if($3==2){             ##Checking condition if 3rd field is 2 then do following.
    a[$1,$3]++           ##Creating an array a whose index is $1,$3 and keep adding its index with 1 here.
  }
  next                   ##next will skip further statements from here.
}
FNR==1{                  ##Checking condition if this is first line.
  $(NF+1)="item"         ##Adding a new field with string item in it.
  print                  ##Printing 1st line here.
  next                   ##next will skip further statements from here.
}
FNR==2{                  ##Checking condition if this is second line.
  $(NF+1)="D"            ##Adding a new field with string item in it.
  print                  ##Printing 1st line here.
  next                   ##next will skip further statements from here.
}
$3!=2{                   ##Checking condition if 3rd field is NOT equal to 2 then do following.
  next                   ##next will skip further statements from here.
}
FNR>2{                   ##Checking condition if line is greater than 2 then do following.
  $(NF+1)=a[$1,$3]       ##Creating new field with value of array a with index of $1,$3 here.
}
1                        ##1 will print edited/non-edited lines here.
' Input_file Input_file   ##Mentioning Input_file names 2 times here.

推荐阅读