awk - 如何基于多列 awk 比较 2 个 csv 文件?
问题描述
我有 2 个样本格式如下的 csv 文件,每个文件有 ~ 5000 行:
文件 1:
EMPLOYEE_NUMBER,LAST_NAME,FIRST_NAME,MIDDLE_NAME,BRANCH,DEPARTMENT,LEVEL,POSITION,EMAIL_ADDRESS
110426,Balbon,Susan,Lagat,"abc Equity Ventures, Inc.",Group Internal Audit,Supervisor,I.S. Audit Supervisor,susan.balbon@abc.com
30083,Mendezona,Bingen,Roemer,"abc Equity Ventures, Inc.",Risk Management Office,Vice President,VP - AEV Security,bing.mendezona@abc.test
110773,Casas,Joyce Grace,Bea,"abc Equity Ventures, Inc.",Tax Advisory and Compliance,Manager,Tax Counsel,joyce.grace.casas@abc.com
286,Fernandez,Mark Brian,Tato,abc Foundation Inc.,Computer Services Division,Supervisor,Senior Applications Supervisor,mark.fernandez@abc.com
291,Plando,Marilou,Polleros,"abc Equity Ventures, Inc.",Administration,Assistant Vice President,AVP - Risk Management,marilou.plando@abc.test
110813,Gemelo-Abarca,Therese Xyza,Dableo,"abc Equity Ventures, Inc.",Governance & Compliance Team,Manager,Associate General Counsel - Corporate Secretarial and Compliance,therese.xyza.abarca@abc.com
30096,Abay,Joanna Marie,Saluria,"abc Equity Ventures, Inc.",Tax Advisory and Compliance,Supervisor,Tax Compliance Officer,joanna.abay@abc.com
110711,Ostan,Margilyn,Salibio,"abc Equity Ventures, Inc.",Accounting,Staff,Senior Accountant 1,margilyn.ostan@abc.com
110732,Fumar-Gonzales,Vanessa Concepcion,Altarejos,"abc Equity Ventures, Inc.",Legal and Corporate Services,Manager,Associate General Counsel - Labor & Litigation,vanessa.gonzales@abc.com
文件 2:
EMPLOYEE_NUMBER,LAST_NAME,FIRST_NAME,MIDDLE_NAME,BRANCH,DEPARTMENT,LEVEL,POSITION,EMAIL_ADDRESS
110426,Balbon,Susan,Lagat,"abc Equity Ventures, Inc.",Group Internal Audit,Supervisor,I.S. Audit Supervisor,susan.balbon@abc.com
30083,Mendezona,Bingen,Roemer,"abc Equity Ventures, Inc.",Security,Vice President,VP - AEV Security,jetee.velante@abc.com
110773,Casas,Joyce Grace,Bea,"abc Equity Ventures, Inc.",Tax Advisory and Compliance,Supervisor,Tax Counsel,joyce.grace.casas@abc.com
286,Fernandez,Mark Brian,Tato,abc Foundation Inc.,Computer Services Division,Supervisor,Senior Applications Supervisor,mark.fernandez@abc.com
291,Plando,Marilou,Polleros,"abc Equity Ventures, Inc.",Risk Management Office,Assistant Vice President,AVP - Risk Management,marilou.plando@abc.test
110866,Dugan,Belinda,Escultura,"abc Equity Ventures, Inc.",Legal Management,Vice President,Vice President for Legal Services Management,dixie.dugan@abc.test
221,Montehermoso,Gladys,Enoy,"abc Equity Ventures, Inc.",Accounting,Staff,Senior Accountant,gladys.montehermoso@abc.com
30102,Oblianda,Anna Cielo,Salud,"abc Equity Ventures, Inc.",Accounting,Supervisor,Accounting Supervisor,cielo.oblianda@abc.com
110499,Bucol,Charmaine Ann,Rebusa,"abc Equity Ventures, Inc.",Group Internal Audit,Staff,Audit Senior,charmaine.ann.bucol@abc.com
我想归档所有行在 EMPLOYEE_NUMBER+EMAIL_ADDRESS 列中具有相同的值,但在使用 awk 的其他列中具有不同的值。
我的理想是基于垂直列 EMPLOYEE_NUMBER+EMAIL_ADDRESS 合并 2 个 csv 文件,并使用 awk 删除重复的行。谢谢
输出将是这样的:
EMPLOYEE_NUMBER,LAST_NAME,FIRST_NAME,MIDDLE_NAME,BRANCH,DEPARTMENT,LEVEL,POSITION,EMAIL_ADDRESS
110773,Casas,Joyce Grace,Bea,"Aboitiz Equity Ventures, Inc.",Tax Advisory and Compliance,Manager,Tax Counsel,joyce.grace.casas@aboitiz.com
110773,Casas,Joyce Grace,Bea,"Aboitiz Equity Ventures, Inc.",Tax Advisory and Compliance,Supervisor,Tax Counsel,joyce.grace.casas@aboitiz.com
解决方案
使用简单的 awk_script 就可以实现,
awk_script:
NR==FNR{
if(FNR==1){print}
a[$1 $2]=$0
next
}
a[$1 $2]!=$0 && a[$1 $2]!=""{
print a[$1 $2],$0
}
要执行的命令,
awk -F',' -v OFS="\n" -f awk_script file1 file2
推荐阅读
- javascript - El-table-column 属性在数组中具有多个值
- go - vscode调试golang:无法附加到pid 27203:在偏移量0x3bc6d5处解码矮部分信息:下溢
- javascript - 用于渲染车把页面的 expressjs 中的 Res.render 不起作用
- python - Django 模型外键字段在测试期间的迁移中不可用
- reactjs - 反应最大更新深度超出错误
- java - 缓存图像按钮android
- javascript - 当搜索词在单词后包含 () 时,搜索不会得到结果
- javascript - 右中殿的另一里打开时如何关闭另一里?
- c - 当我开发免费、复制和比较功能时如何创建新地图
- java - 动态设置边距