首页 > 解决方案 > bash 中的 GROUP BY CSV 列

问题描述

我正在使用 bash 中的 .csv 文件,我需要根据前面的字段对每行的最后一个值求和。也就是说,我需要在 Bash 中按前三列分组。

输入文件示例:

Barcelona, Female, suspect, 2
Barcelona, Female, positive, 3
Barcelona, Female, positive, 2
Barcelona, Male, positive, 1
Barcelona, Female, suspect, 5
Madrid, Male, positive, 3
Madrid, Male, positive, 1
Barcelona, Male, positive, 4
Madrid, Female, suspect, 2

输出文件示例:

Barcelona, Female, suspect, 7
Barcelona, Female, positive, 5
Barcelona, Male, positive, 5
Barcelona, Female, suspect, 5
Madrid, Male, positive, 4
Madrid, Female, suspect, 2


标签: bashcsvsumaggregate

解决方案


GNU datamash专为此类任务而设计:

datamash -t, -sg1,2,3 sum 4 < input.csv

或与awk

awk -F, '{ groups[$1 "," $2 "," $3] += $4}
         END { PROCINFO["sorted_in"] = "@ind_str_asc" # Sort output in GNU awk
               for (g in groups) print g "," groups[g] }' input.csv

推荐阅读