首页 > 解决方案 > 在 awk 中重置 NR

问题描述

cat file.txt

MNS GYPA*N  
MNS GYPA*M  c.59T>C;c.71A>G;c.72G>T
MNS GYPA*Mc c.71G>A;c.72T>G
MNS GYPA*Vw c.140C>T
MNS GYPA*Mg c.68C>A
MNS GYPA*Vr c.197C>A
MNS GYPB*Mta    c.230C>T
MNS GYPB*Ria    c.226G>A
MNS GYPB*Nya    c.138T>A
MNS GYPA*Hut    c.140C>A
.
.
.

第二列值可以以 GYPA、GYPB、GYPC、GYPD、... GYPZ 开头。我想为每个 GYP* 设置一个位置计数并将第三列拆分如下:

1   MNS  GYPA*N
2   MNS GYPA*M  c.59T>C
2   MNS GYPA*M  c.71A>G
2   MNS GYPA*M  c.72G>T
3   MNS GYPA*Mc c.71G>A
3   MNS GYPA*Mc c.72T>G
4   MNS GYPA*Vw .140C>T
5   MNS GYPA*Mg c.68C>A
6   MNS GYPA*Vr c.197C>A
1   MNS GYPB*Mta    c.230C>T
2   MNS GYPB*Ria    c.226G>A
3   MNS GYPB*Nya    c.138T>A
4   MNS GYPB*Hut    c.140C>A
.
.
.

format.awk

BEGIN {FS=OFS="\t"}

$2 ~ /GYPA/
   { num=split($3,arr,/;/);
      for (i=1;i<=num;i++)
         { print NR,$1,$2,arr[i]}}

$2 ~ /GYPB/
   { num=split($3,arr,/;/);
      for (i=1;i<=num;i++)
         { print NR,$1,$2,arr[i]} }
...

我不确定当 NR 到达下一个 ~ GYP 时如何重置它。GYP{A..Z} 的顺序是从 A 到 Z。

标签: awkash

解决方案


awk '
{
  match($2,/[^*]*/)
  gy_value=substr($2,RSTART,RLENGTH)
}
gy_value!=prev_gy_value{
  count=0
}
!arr[$2]++{
  count++
}
{
  num=split($3,array,";")
  for(i=1;i<=num;i++){
    print count,$1,$2,array[i]
  }
}
NF<3;
{
  prev_gy_value=gy_value
}
' file.txt

说明:为上述代码添加详细说明。

awk '                                   ##Starting awk program from here.
{
  match($2,/[^*]*/)                     ##Using match function to match till * in 2nd field.
  gy_value=substr($2,RSTART,RLENGTH)    ##Creating variable gy_value which has sub-string of 2nd field sub-string in it.
}
gy_value!=prev_gy_value{                
  count=0                               ##Creating variable count as 0 here.
}
{
  count++                               ##Increasing value of count with 1 here.
}
{
  num=split($3,array,";")               ##Splitting 3rd field into an array with delimiter ; and its count is stored into num variable.
  for(i=1;i<=num;i++){                  ##Starting for loop from i=1 to till value of num here.
    print count,$1,$2,array[i]                ##Printing value of $1,$2 and array with index variable i here.
  }
}
NF<3;                                   ##Checking condition if NF<3 then print the line here.
{
  prev_gy_value=gy_value                ##Setting value of variable gy_value to variable named prev_gy_value here(which is used above code to make sure about values check).
}
'  Input_file                           ##Mentioning Input_file name here.

推荐阅读