首页 > 解决方案 > Bash, join two tables of differing lengths using sort and join

问题描述

I realise this is a repeat but solutions in other posts will not work for me, apologies.

I have two tables, testcut.pileup:

chrM    1370    T       1
chrM    1371    T       1
chrM    1372    C       1
chrM    1373    T       1
chrM    1374    A       1

And testcut2.pileup:

chrM    720     T       1
chrM    721     T       1
chrM    722     C       1
chrM    723     A       0
chrM    724     C       1
chrM    1370    T       2
chrM    1371    T       3
chrM    1372    C       4
chrM    1373    T       3
chrM    1374    A       2

I want to join these datasets so that the appear alongside one another. In an ideal world these would be organised by the numbers in the second columns (as below), but I would be happy just to have them alongside one another.

Desired result:

                             chrM    720     T       1
                             chrM    721     T       1
                             chrM    722     C       1
                             chrM    723     A       0
                             chrM    724     C       1
chrM    1370    T       1    chrM    1370    T       2
chrM    1371    T       1    chrM    1371    T       3
chrM    1372    C       1    chrM    1372    C       4
chrM    1373    T       1    chrM    1373    T       3
chrM    1374    A       1    chrM    1374    A       2

So far I am trying this:

join -j 2 <(sort -k 2 testcut.pileup) <(sort -k 2 testcut2.pileup) > join.pileup

But I recieve the following errors:

join: file 1 is not in sorted order
join: file 2 is not in sorted order

If I run join straight, the numbers in the second column are shifted:

chrM 1370 T 1 720 T 1
chrM 1370 T 1 721 T 1
chrM 1370 T 1 722 C 1
chrM 1370 T 1 723 A 0
chrM 1370 T 1 724 C 1

Thanks for any help.

标签: bashawk

解决方案


awk救援!

你可以让它更通用,但我硬编码了 4 个字段

$ awk 'NR==FNR {a[$2]=$0; next} 
               {print  $2 in a?a[$2] "\t" $0:"\t\t\t\t"$0}' file{1,2} | 
  column -ts'\t'

                                chrM    720     T       1
                                chrM    721     T       1
                                chrM    722     C       1
                                chrM    723     A       0
                                chrM    724     C       1
chrM    1370    T       1       chrM    1370    T       2
chrM    1371    T       1       chrM    1371    T       3
chrM    1372    C       1       chrM    1372    C       4
chrM    1373    T       1       chrM    1373    T       3
chrM    1374    A       1       chrM    1374    A       2

推荐阅读