首页 > 解决方案 > Keep first duplicate and replace the rest with blank cell using Awk

问题描述

I have a tsv file where I have 2 columns, with duplicates in the 2nd column. What I would like to do is keep the first duplicate value and replace the rest with blanks. E.g.

Original tsv:

ahah.asd   aha
ahsjd.asd  aha
asdd.asda  aha
ajd.asd    aha
asdfk.lo   abb
hasd.pou   abb
hasd.asd   jjj
asidh.09   kkk
asdhs.97   kkk

Expected output:

ahah.asd   aha
ahsjd.asd  
asdd.asda  
ajd.asd    
asdfk.lo   abb
hasd.pou   
hasd.asd   jjj
asidh.09   kkk
asdhs.97   

In addition to this I would like to add a column that increments until if see a duplicate in column 2. E.g:

ahah.asd   aha   1
ahsjd.asd        2
asdd.asda        3
ajd.asd          4
asdfk.lo   abb   1
hasd.pou         2
hasd.asd   jjj   1
asidh.09   kkk   1 
asdhs.97         2

Is this possible? I would like to use awk...

Thanks

标签: linuxbashcsvawk

解决方案


$ awk 'BEGIN{FS=OFS="\t"} {print $1, (cnt[$2]++ ? "" : $2), cnt[$2]}' file
ahah.asd        aha     1
ahsjd.asd               2
asdd.asda               3
ajd.asd         4
asdfk.lo        abb     1
hasd.pou                2
hasd.asd        jjj     1
asidh.09        kkk     1
asdhs.97                2

推荐阅读