首页 > 解决方案 > 如何通过 awk 中的临时公共列连接两个 CSV 文件?

问题描述

我有两个 CSV 文件,格式为

文件 1

A,44
A,21
B,65
C,79

文件2

A,7
B,4
C,11

awk用作

awk -F, 'NR==FNR{a[$1]=$0;next} ($1 in a){print a[$1]","$2 }' file1.csv file2.csv

生产

A,44,7
A,21,7
B,65,4
C,79,11

a[$1]从 打印整行file1。如何省略两个文件中的第一列(第一列仅用于匹配第二列)以生成:

44,7
21,7
65,4
79,11

换句话说,我怎样才能将第一个文件中的列传递到打印块,就像$2第二个文件一样?

标签: awk

解决方案


您能否尝试仅在显示的示例上进行跟踪、测试和编写。

awk 'BEGIN{FS=OFS=","} FNR==NR{a[$1]=$2;next} ($1 in a){print $2,a[$1]}' file2 file1

说明:为上述添加详细说明。

awk '                     ##Starting awk program from here.
BEGIN{                    ##Starting BEGIN section from here.
  FS=OFS=","              ##Setting field and output field separator as comma here.
}
FNR==NR{                  ##Checking condition FNR==NR which will be TRUE when file2 is being read.
  a[$1]=$2                ##Creating array a with index $1 and value is $2 from current line.
  next                    ##next will skip all further statement from here.
}
($1 in a){                ##Statements from here will be executed when file1 is being read and it's checking if $1 is present in array a then do following.
  print $2,a[$1]          ##Printing 2nd field and value of array a with index $1 here.
}
' file2 file1             ##Mentioning Input_file names here.

所示样本的输出如下。

44,7
21,7
65,4
79,11


第二种解决方案:更通用的解决方案,考虑到在这种情况下您的两个 Input_files 可能有重复项,它会将 Input_file1 中 A 的第一个值打印到 Input_file2 的第一个值,依此类推。

awk '
BEGIN{
  FS=OFS=","
}
FNR==NR{
  a[$1]
  b[$1,++c[$1]]=$2
  next
}
($1 in a){
  print $2,b[$1,++d[$1]]
}
' file2 file1

推荐阅读