首页 > 解决方案 > 将基于日期时间的 2 个 csv 文件与 shell 组合

问题描述

嗨,我有 3 个 csv 文件,如下所示

datetime, forecast 2016-02-02 00:00:00, 23.34 2016-02-02 00:10:00, 29.23

timestamp, forecast, v1, v2 2016-02-02 00:00:00, 68.56, 012, .23 2016-02-02 00:10:00, 23.24, .25, .32

timestamp, forecast[ma], v1 2016-02-02 00:00:00, 56.32, 32 2016-02-02 00:10:00, 25.21, 56

我希望我的输出有

Time, Forecast, forecast1, forecast2 2016-02-02 00:00:00, 23.34, 68.56, 56.32 2016-02-02 00:10:00, 29.23, 23.24, 25.21

我已经创建了代码来将 xlsx 中的这些文件与 python 结合起来。现在我打算用 shell 进一步处理这些文件,我希望这些文件在 csv 中。

我试过像这样的代码。

join -j 2 -o 1.1,1.2,2.2 <(sort -k2 $path_DMS/$file_name) <(sort -k2 $path_ISRO/$file_name)

谢谢

标签: shellcsvawk

解决方案


您能否尝试以下操作(这应该在大多数情况下都有效awk)。

awk '
BEGIN{
  FS=OFS=", "
  print "Time, Forecast, forecast1, forecast2"
}
FNR==1{
  ++count
  next
}
count==1{
  a[$1]=$2
  next
}
count==2{
  a[$1]=a[$1] OFS $2
  next
}
count==3{
  print $1,a[$1],$2
}'  file1.csv file2.csv file3.csv

输出如下。

Time, Forecast, forecast1, forecast2
2016-02-02 00:00:00, 23.34, 68.56, 56.32
2016-02-02 00:10:00, 29.23, 23.24, 25.21

说明:现在为上述代码添加详细说明。

awk '                                                ##Starting awk program here.
BEGIN{                                               ##Mentioning BEGIN section of awk which will execute before Input_file(s) getting read.
  FS=OFS=", "                                        ##Setting FS and OFS as ", " read man awk for FS and OFS too.
  print "Time, Forecast, forecast1, forecast2"       ##Printing headers for output.
}                                                    ##Closing BEGIN section here.
FNR==1{                                              ##Checking condition if this is first line of all Input_file(s).
  ++count                                            ##Increment variable count with 1 here.
  next                                               ##next will skip all further statements from here.
}                                                    ##Closing FNR==1 BLOCK here.
count==1{                                            ##Checking if count==1 then do following.
  a[$1]=$2                                           ##Creating an array a whose index $1 and value is $2.
  next                                               ##next will skip all further statements.
}                                                    ##Closing count==1 BLOCK here.
count==2{                                            ##Checking condition if count==2 then do following.
  a[$1]=a[$1] OFS $2                                 ##Concatenate value of a[$1] to its previous value which it got from file1.csv
  next                                               ##next will skip all further statements from here.
}                                                    ##Closing count==2 BLOCK here.
count==3{                                            ##Checking condition if count==3 then do following.
  print $1,a[$1],$2                                  ##Printing first field, a[$1] value  and $2 of current line for file3.csv
}'  file1.csv file2.csv file3.csv                    ##Mentioning all Input_file(s) names here.

推荐阅读