首页 > 解决方案 > 比较来自不同文件的 2 列打印匹配列

问题描述

我知道有人问过类似的问题,这导致我编写了当前代码,但我仍然无法获得正确的输出。问题:如果第 1 列(在文件 1 中)与第 5 列(在文件 2 中)匹配,则将文件 2 中的所有列以及第 3 列和第 4 列(在文件 1 中)打印到新文件中。

文件 1(制表符分隔)

NJE_00001   rmf 6.2 Ribosome modulation factor
NJE_00002   rlm 7.1 Ribosomal RNA large subunit methyltransferase
NJE_00003   gnt 6.2 putative D-xylose utilization operon
NJE_00004   prp 4.1 2-methylisocitrate lyase

文件 2(制表符分隔)

AFC_04390   rmf 5.6 protein1    NJE_00001
AFC_04391   rlm 2.5 protein54   NJE_00002
AFC_04392   gnt 2.1 protein8    NJE_00003
AFC_04393   prp 4.1 protein5    NJE_00004

所需的输出(制表符分隔)

AFC_04390   rmf 5.6 protein1    NJE_00001   6.2 Ribosome modulation factor
AFC_04391   rlm 2.5 protein54   NJE_00002   7.1 Ribosomal RNA large subunit methyltransferase
AFC_04392   gnt 2.1 protein8    NJE_00003   6.2 putative D-xylose utilization operon
AFC_04393   prp 4.1 protein5    NJE_00004   5.9 2-methylisocitrate lyase

我试过的:

awk -F '\t' 'NR==FNR {a[$1]=$3"\t"$4; next} ($5 in a) {print $1 "\t" $2 "\t" $3 "\t" $4 "\t" $5 "\t" a[$1]}' file1.tsv file2.tsv > file.out

awk -F '\t' 'NR==FNR {a[$1]=$2; next} {if ($5 in a) {print $1 "\t" $2 "\t" $3 "\t" $4 "\t" $5 "\t" a[$1]}}' file1.tsv file2.tsv > file.out

awk -F '\t' 'NR==FNR {h[$1]=$3"\t"$4; next} ($5 in h) {print $1 "\t" $2 "\t" $3 "\t" $4 "\t" $5 "\t" h[$1]}' file1.tsv file2.tsv > file.out

他们都给出了与文件 2 相同的相同输出。任何帮助将不胜感激!谢谢!

标签: bashawk

解决方案


请您尝试以下操作。

awk '
FNR==NR{
  val=$1
  $1=$2=""
  sub(/^ +/,"")
  a[val]=$0
  next
}
($NF in a){
  print $0,a[$NF]
}
'  Input_file1  Input_file2

说明:为上述代码添加详细说明。

awk '                 ##Starting awk program from here.
FNR==NR{              ##Checking condition FNR==NR which will be TRUE when Input_file1 is being read.
  val=$1              ##Creating variable val which has $1 of current line.
  $1=$2=""            ##Nullifying first and second fields here.
  sub(/^ +/,"")       ##Substituting initial space with NULL in current line.
  a[val]=$0           ##Creating an array named a with index val and value of current line.
  next                ##next will skip further lines from here.
}
($NF in a){           ##Checking condition if $NF(last field of current line) is present in array a then do following.
  print $0,a[$NF]     ##Printing current line with array a with index $NF value.
}
' file1 file2         ##Mentioning Input_file names here.

推荐阅读