bash - bash 中的 GROUP BY CSV 列
问题描述
我正在使用 bash 中的 .csv 文件,我需要根据前面的字段对每行的最后一个值求和。也就是说,我需要在 Bash 中按前三列分组。
输入文件示例:
Barcelona, Female, suspect, 2
Barcelona, Female, positive, 3
Barcelona, Female, positive, 2
Barcelona, Male, positive, 1
Barcelona, Female, suspect, 5
Madrid, Male, positive, 3
Madrid, Male, positive, 1
Barcelona, Male, positive, 4
Madrid, Female, suspect, 2
输出文件示例:
Barcelona, Female, suspect, 7
Barcelona, Female, positive, 5
Barcelona, Male, positive, 5
Barcelona, Female, suspect, 5
Madrid, Male, positive, 4
Madrid, Female, suspect, 2
解决方案
GNU datamash专为此类任务而设计:
datamash -t, -sg1,2,3 sum 4 < input.csv
或与awk
:
awk -F, '{ groups[$1 "," $2 "," $3] += $4}
END { PROCINFO["sorted_in"] = "@ind_str_asc" # Sort output in GNU awk
for (g in groups) print g "," groups[g] }' input.csv
推荐阅读
- python - Close flask application when browser gets closed
- ios - Firestore Pagination get next batch of data
- r - Function with optimized parameters does not come close to data using mle2 in R
- python - What is the difference between "import package" versus "from package import ..."?
- shell - “无法创建目录/mnt/.local/share/nano/:没有这样的文件或目录它是保存/加载搜索历史或光标位置所必需的。”
- google-apps-script - 如何使用 Google 应用脚本集成 Google Ads Manager (GAM)
- python - 从python中的列表列表中过滤掉一个列表
- angularjs - ngIf 在我刷新后立即抛出错误消息
- c# - 如何使用 NUnit 和 Moq 模拟视图/控制器?
- kubernetes - Pod 服务帐户未使用定义的 PSP 配置文件