首页 > 解决方案 > 根据其他列中的值保持所有行与最高值匹配 - Bash

问题描述

道歉; sort应该有一种简单的方法来使用/ unique/的组合来做我想做的事,awk但我找不到它。

这是我能够获得的“干净”数据表的一部分(按列排序Gene,然后按Length)。

Length  Gene                    
3013    ENSDARG00000000018      
3013    ENSDARG00000000018      
2933    ENSDARG00000000018      
2933    ENSDARG00000000018      
2933    ENSDARG00000000018      
2933    ENSDARG00000000018      
2033    ENSDARG00000000068      
2033    ENSDARG00000000068      
901     ENSDARG00000000068      
901     ENSDARG00000000068      

我需要为每个值保留列中具有最高值的所有行。这是所需的输出:LengthGene

  Length  Gene                    
  3013    ENSDARG00000000018      
  3013    ENSDARG00000000018      
  2033    ENSDARG00000000068      
  2033    ENSDARG00000000068      

给出的解决方案应该适用于 ca 的表。30,000 个Gene值。非常感谢您的帮助!

标签: bashsortingawk

解决方案


这个简单awk应该可以帮助你。

awk 'FNR==NR{a[$2]=(a[$2]>$1?a[$2]:$1);next} a[$2]==$1'  Input_file  Input_file

解释:

awk '
FNR==NR{                              ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
  a[$2]=(a[$2]>$1?a[$2]:$1)           ##Creating an array named a whose index is $2 and value is depending upon condition if its value is greater than $1 then leave it as it is else replace its value with current $1 value.
  next                                ##next is awk out of box keyword which will skip all further statements.
}
a[$2]==$1                             ##This statement will be executed when 2nd time Input_file is being read and checking condition if value of a[$2] is equal to first field of current line, if yes then print that line.
'  Input_file Input_file              ##Mentioning Input_file name 2 times here.

推荐阅读