首页 > 解决方案 > awk:根据另一列的值有条件地更改字段的值

问题描述

我有一个表snp150Common.txt,其中第二个和第三个字段$2 and $3可以相等或不相等。

如果他们是平等的,我想$2成为$2-1,所以:

chr1    10177   10177   rs367896724 -   -   -/C insertion   near-gene-5
chr1    10352   10352   rs555500075 -   -   -/A insertion   near-gene-5
chr1    11007   11008   rs575272151 C   C   C/G single      near-gene-5
chr1    11011   11012   rs544419019 C   C   C/G single      near-gene-5
chr1    13109   13110   rs540538026 G   G   A/G single      intron
chr1    13115   13116   rs62635286  T   T   G/T single      intron
chr1    13117   13118   rs62028691  A   A   C/T single      intron
chr1    13272   13273   rs531730856 G   G   C/G single      ncRNA
chr1    14463   14464   rs546169444 A   A   A/T single      near-gene-3,ncRNA

变成:

chr1    10176   10177   rs367896724 -   -   -/C insertion   near-gene-5
chr1    10351   10352   rs555500075 -   -   -/A insertion   near-gene-5
chr1    11007   11008   rs575272151 C   C   C/G single      near-gene-5
chr1    11011   11012   rs544419019 C   C   C/G single      near-gene-5
chr1    13109   13110   rs540538026 G   G   A/G single      intron
chr1    13115   13116   rs62635286  T   T   G/T single      intron
chr1    13117   13118   rs62028691  A   A   C/T single      intron
chr1    13272   13273   rs531730856 G   G   C/G single      ncRNA
chr1    14463   14464   rs546169444 A   A   A/T single      near-gene-3,ncRNA

我当前的命令改编自https://askubuntu.com/a/312843

zcat < snp150/snp150Common.txt.gz | head | awk '{ if ($2 == $3) $2=$2-1; print $0 }' | cut -f 2,3,4,5,8,9,10,12,16

给出相同的输出:

chr1    10177   10177   rs367896724 -   -   -/C insertion   near-gene-5
chr1    10352   10352   rs555500075 -   -   -/A insertion   near-gene-5
chr1    11007   11008   rs575272151 C   C   C/G single      near-gene-5
chr1    11011   11012   rs544419019 C   C   C/G single      near-gene-5
chr1    13109   13110   rs540538026 G   G   A/G single      intron
chr1    13115   13116   rs62635286  T   T   G/T single      intron
chr1    13117   13118   rs62028691  A   A   C/T single      intron
chr1    13272   13273   rs531730856 G   G   C/G single      ncRNA
chr1    14463   14464   rs546169444 A   A   A/T single      near-gene-3,ncRNA

任何帮助是极大的赞赏。

标签: unixif-statementawkconditionaltext-processing

解决方案


这个答案是基于对源文件格式的纯粹推测:

$ zcat snp150/snp150Common.txt.gz | 
  awk '
  BEGIN { OFS="\t" }                       # field separators are most likely tabs
  {
      if ($3 == $4)                        # based on cut these should be compared
          $3=$3-1
      print $2,$3,$4,$5,$8,$9,$10,$12,$16  # ... and there fields printed
  }
  NR==10 { exit }'                         # this replaces head

请记住:练习(除了吸吮之外的任何事情)会让你吸得更少。


推荐阅读