awk - 在 awk 中重置 NR
问题描述
cat file.txt
MNS GYPA*N
MNS GYPA*M c.59T>C;c.71A>G;c.72G>T
MNS GYPA*Mc c.71G>A;c.72T>G
MNS GYPA*Vw c.140C>T
MNS GYPA*Mg c.68C>A
MNS GYPA*Vr c.197C>A
MNS GYPB*Mta c.230C>T
MNS GYPB*Ria c.226G>A
MNS GYPB*Nya c.138T>A
MNS GYPA*Hut c.140C>A
.
.
.
第二列值可以以 GYPA、GYPB、GYPC、GYPD、... GYPZ 开头。我想为每个 GYP* 设置一个位置计数并将第三列拆分如下:
1 MNS GYPA*N
2 MNS GYPA*M c.59T>C
2 MNS GYPA*M c.71A>G
2 MNS GYPA*M c.72G>T
3 MNS GYPA*Mc c.71G>A
3 MNS GYPA*Mc c.72T>G
4 MNS GYPA*Vw .140C>T
5 MNS GYPA*Mg c.68C>A
6 MNS GYPA*Vr c.197C>A
1 MNS GYPB*Mta c.230C>T
2 MNS GYPB*Ria c.226G>A
3 MNS GYPB*Nya c.138T>A
4 MNS GYPB*Hut c.140C>A
.
.
.
猫format.awk
BEGIN {FS=OFS="\t"}
$2 ~ /GYPA/
{ num=split($3,arr,/;/);
for (i=1;i<=num;i++)
{ print NR,$1,$2,arr[i]}}
$2 ~ /GYPB/
{ num=split($3,arr,/;/);
for (i=1;i<=num;i++)
{ print NR,$1,$2,arr[i]} }
...
我不确定当 NR 到达下一个 ~ GYP 时如何重置它。GYP{A..Z} 的顺序是从 A 到 Z。
解决方案
awk '
{
match($2,/[^*]*/)
gy_value=substr($2,RSTART,RLENGTH)
}
gy_value!=prev_gy_value{
count=0
}
!arr[$2]++{
count++
}
{
num=split($3,array,";")
for(i=1;i<=num;i++){
print count,$1,$2,array[i]
}
}
NF<3;
{
prev_gy_value=gy_value
}
' file.txt
说明:为上述代码添加详细说明。
awk ' ##Starting awk program from here.
{
match($2,/[^*]*/) ##Using match function to match till * in 2nd field.
gy_value=substr($2,RSTART,RLENGTH) ##Creating variable gy_value which has sub-string of 2nd field sub-string in it.
}
gy_value!=prev_gy_value{
count=0 ##Creating variable count as 0 here.
}
{
count++ ##Increasing value of count with 1 here.
}
{
num=split($3,array,";") ##Splitting 3rd field into an array with delimiter ; and its count is stored into num variable.
for(i=1;i<=num;i++){ ##Starting for loop from i=1 to till value of num here.
print count,$1,$2,array[i] ##Printing value of $1,$2 and array with index variable i here.
}
}
NF<3; ##Checking condition if NF<3 then print the line here.
{
prev_gy_value=gy_value ##Setting value of variable gy_value to variable named prev_gy_value here(which is used above code to make sure about values check).
}
' Input_file ##Mentioning Input_file name here.
推荐阅读
- java - Jsoup 中不再包含 HtmlToPlainText
- mutt - 用户名与邮件不同的 Mutt 配置
- tensorflow - Tensorflow:根据另一个张量对一个张量进行采样?
- python - 在单元测试期间硬编码 Python 解释器 exec 的替代方法
- mysql - 如何在表格上插入日期 ISO 8601?
- javascript - Puppeteer,保存网页和图像
- fido-u2f - 您如何使用 Webauthn API 实现 FIDO U2F?
- python - 具有长变量名的配对图
- python - Python - 如果未定义子对象,则使类返回对象
- md5 - 如何将两个文件夹与其文件进行比较并确保它们相同