bash - 在 awk 中用它们各自的字符串替换数字
问题描述
我是 bash/awk 编程的新手,我的文件如下所示:
1 10032154 10032154 A C Leber_congenital_amaurosis_9 criteria_provided,_single_submitter Benign . 1
1 10032184 10032184 A G Retinal_dystrophy|Leber_congenital_amaurosis_9|not_provided criteria_provided,_multiple_submitters,_no_conflicts Pathogenic/Likely_pathogenic . 1,4
1 10032209 10032209 G A not_provided criteria_provided,_single_submitter Likely_benign . 8,64,512
使用 awk,我想更改最后一列 ($10) 中的数字及其描述。我在两个不同的数组中分配了数字及其定义。我的想法是通过一起迭代两个数组来更改这些数字。这里,0 是“未知”,1 是“种系”,4 是“体细胞”,然后继续。
z=(0 1 2 4 8 16 32 64 128 256 512 1024 1073741824)
t=("unknown" "germline" "somatic" "inherited" "paternal" "maternal" "de-novo" "biparental" "uniparental" "not-tested" "tested-inconclusive" "not-reported" "other")
number=$(IFS=,; echo "${z[*]}")
def=$(IFS=,; echo "${t[*]}")
awk -v a="$number" -v b="${def}" 'BEGIN { OFS="\t" } /#/ {next}
{
x=split(a, e, /,/)
y=split(b, f, /,/)
delete c
m=split($10, c, /,/)
for (i=1; i<=m; i++) {
for (j=1; j<=x; j++) {
if (c[i]==e[j]) {
c[i]=f[j]
}
}
$10+=sprintf("%s, ",c[i])
}
print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10
}' input.vcf > output.vcf
输出应如下所示:
1 10032154 10032154 A C Leber_congenital_amaurosis_9 criteria_provided,_single_submitter Benign . germline
1 10032184 10032184 A G Retinal_dystrophy|Leber_congenital_amaurosis_9|not_provided criteria_provided,_multiple_submitters,_no_conflicts Pathogenic/Likely_pathogenic . germline,paternal
1 10032209 10032209 G A not_provided criteria_provided,_single_submitter Likely_benign . paternal,biparental,tested-inconclusive
如果你能帮助我,我会很高兴的!
一切顺利
解决方案
假设由于某些其他原因,您实际上不需要将数字和名称列表定义为 2 个 shell 数组:
$ cat tst.awk
BEGIN {
split("0 1 2 4 8 16 32 64 128 256 512 1024 1073741824",nrsArr)
split("unknown germline somatic inherited paternal maternal de-novo biparental uniparental not-tested tested-inconclusive not-reported other",namesArr)
for (i in nrsArr) {
nr2name[nrsArr[i]] = namesArr[i]
}
}
!/#/ {
n = split($NF,nrs,/,/)
sub(/[^[:space:]]+$/,"")
printf "%s", $0
for (i=1; i<=n; i++) {
printf "%s%s", nr2name[nrs[i]], (i<n ? "," : ORS)
}
}
$ awk -f tst.awk input.vcf
1 10032154 10032154 A C Leber_congenital_amaurosis_9 criteria_provided,_single_submitter Benign . germline
1 10032184 10032184 A G Retinal_dystrophy|Leber_congenital_amaurosis_9|not_provided criteria_provided,_multiple_submitters,_no_conflicts Pathogenic/Likely_pathogenic . germline,inherited
1 10032209 10032209 G A not_provided criteria_provided,_single_submitter Likely_benign . paternal,biparental,tested-inconclusive
上面保留了输入文件中的任何空白,以防万一。
推荐阅读
- c# - 在 ASP.NET Core 应用程序关闭时退出 Firefox Selenium Webdriver
- android - 无法加载 memtrack 模块:-2
- amazon-web-services - AWS Kinesis Producer Library 是否在内存中聚合数据?
- python - 使用opencv(python)获取外部轮廓
- redis - redis-trib.rb 不再可用但 redis-cli --cluster create 抛出无法识别的选项错误
- php - 计算日期差异
- angularjs - $compile 后删除未定义的选择选项?
- oauth - alexa 技能帐户链接在令牌端点失败
- angularjs - AngularJS 1.5 不显示在 IE11 中
- swift - 查询firebase数据库嵌套的孩子