bash - Bash, join two tables of differing lengths using sort and join
问题描述
I realise this is a repeat but solutions in other posts will not work for me, apologies.
I have two tables, testcut.pileup:
chrM 1370 T 1
chrM 1371 T 1
chrM 1372 C 1
chrM 1373 T 1
chrM 1374 A 1
And testcut2.pileup:
chrM 720 T 1
chrM 721 T 1
chrM 722 C 1
chrM 723 A 0
chrM 724 C 1
chrM 1370 T 2
chrM 1371 T 3
chrM 1372 C 4
chrM 1373 T 3
chrM 1374 A 2
I want to join these datasets so that the appear alongside one another. In an ideal world these would be organised by the numbers in the second columns (as below), but I would be happy just to have them alongside one another.
Desired result:
chrM 720 T 1
chrM 721 T 1
chrM 722 C 1
chrM 723 A 0
chrM 724 C 1
chrM 1370 T 1 chrM 1370 T 2
chrM 1371 T 1 chrM 1371 T 3
chrM 1372 C 1 chrM 1372 C 4
chrM 1373 T 1 chrM 1373 T 3
chrM 1374 A 1 chrM 1374 A 2
So far I am trying this:
join -j 2 <(sort -k 2 testcut.pileup) <(sort -k 2 testcut2.pileup) > join.pileup
But I recieve the following errors:
join: file 1 is not in sorted order
join: file 2 is not in sorted order
If I run join
straight, the numbers in the second column are shifted:
chrM 1370 T 1 720 T 1
chrM 1370 T 1 721 T 1
chrM 1370 T 1 722 C 1
chrM 1370 T 1 723 A 0
chrM 1370 T 1 724 C 1
Thanks for any help.
解决方案
awk
救援!
你可以让它更通用,但我硬编码了 4 个字段
$ awk 'NR==FNR {a[$2]=$0; next}
{print $2 in a?a[$2] "\t" $0:"\t\t\t\t"$0}' file{1,2} |
column -ts'\t'
chrM 720 T 1
chrM 721 T 1
chrM 722 C 1
chrM 723 A 0
chrM 724 C 1
chrM 1370 T 1 chrM 1370 T 2
chrM 1371 T 1 chrM 1371 T 3
chrM 1372 C 1 chrM 1372 C 4
chrM 1373 T 1 chrM 1373 T 3
chrM 1374 A 1 chrM 1374 A 2
推荐阅读
- javascript - node stripe products list return empty
- javascript - 工厂函数中的“This”
- web-scraping - 网络从汤文件中抓取相关信息
- python - Django 模型字段包含奇怪的属性 _("private")
- reactjs - React Router - Route children 和 component props 有什么区别
- java - 如何将大型 JSON 字符串转换为 JSON 对象?
- flutter - 颤振溢出圆圈头像覆盖容器
- html - 如何使用 sed 将 CSS 样式文件内联到 HTML 文件?
- r - 从后验预测分布抽样(stan vs inla)
- html - 反应渲染问题