awk - 从数据列表中提取序列到单独的行
问题描述
sample.txt
确实有“制表符分隔的列”,并且semi-colon seperated
需要将其从数字序列相应地拆分为重复值。
cat sample.txt
2 2627 588;577
2 2629 566
2 2685 568-564
2 2771 573
2 2773 597
2 2779 533
2 2799 558
2 6919 726;740-742;777
2 7295 761;771-772
请注意,某些行可能有倒序568-564
通过使用以前的脚本,我设法将其拆分,但未能从序列中提取(由破折号拆分)
#!/bin/sh
awk -F"\t" '{print $1}' $1 >> $2 &&
awk -F"\t" '{print $2}' $1 >> $2 &&
awk -F"\t" '{print $3}' $1 >> $2 &&
sed -i "s/^M//;s/;\r//g" $2
#!/bin/awk -f
BEGIN { FS=";"; recNr=1}
!NF { ++recNr; lineNr=0; next }
{ ++lineNr }
lineNr == 1 { next }
recNr == 1 { a[lineNr] = $0 }
recNr == 2 { b[lineNr] = $0 }
recNr == 3 {
for (i=1; i<=NF; i++) {
print a[lineNr] "," b[lineNr] "," $i
}
}
预期的
2,2627,588
2,2627,577
2,2629,566
2,2685,564
2,2685,565
2,2685,566
2,2685,567
2,2685,568
2,2771,573
2,2773,597
2,2779,533
2,2799,558
2,6919,726
2,6919,740
2,6919,741
2,6919,742
2,6919,777
2,7295,761
2,7295,771
2,7295,772
解决方案
您能否尝试以下操作(将在几分钟内添加解释)。
awk '
BEGIN{
OFS=","
}
{
num=split($NF,array,";")
for(i=1;i<=num;i++){
if(array[i]~/-/){
split(array[i],array2,"-")
to=array2[1]>array2[2]?array2[1]:array2[2]
from=array2[1]<array2[2]?array2[1]:array2[2]
while(from<=to){
print $1,$2,from++
}
}
else{
print $1,$2,array[i]
}
from=to=""
}
}
' Input_file
说明:为上述代码添加详细说明。
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of code here.
OFS="," ##Setting OFS as comma here.
}
{
num=split($NF,array,";") ##Splitting last field of line into an array named array with delimiter semi-colon here.
for(i=1;i<=num;i++){ ##Starting a for loop from 1 to till value of num which is actually length of array created in previous step.
if(array[i]~/-/){ ##Checking condition if array value with index i is having dash then do followong.
split(array[i],array2,"-") ##Split value of array with index i to array2 here with delimiter -(dash) here.
to=array2[1]>array2[2]?array2[1]:array2[2] ##Creating to variable which will compare 2 elements of array2 and have maximum value out of them here.
from=array2[1]<array2[2]?array2[1]:array2[2] ##Creating from variable which will compare 2 elements of array2 and will have minimum out of them.
while(from<=to){ ##Running while loop from variable from to till value of variable to here.
print $1,$2,from++ ##Printing 1st, 2nd fields with value of from variable and increasing from value with 1 each time it comes here.
}
}
else{ ##Mention else part of if condition here.
print $1,$2,array[i] ##Printing only 1st, 2nd fields along with value of array with index i here.
}
from=to="" ##Nullifying variables from and to here.
}
}
' Input_file ##Mentioning Input_file name here.
根据詹姆斯先生的评论添加条件陈述?
和解释的链接::
https://www.gnu.org/software/gawk/manual/html_node/Conditional-Exp.html
对于显示的示例输出将如下所示。
2,2627,588
2,2627,577
2,2629,566
2,2685,564
2,2685,565
2,2685,566
2,2685,567
2,2685,568
2,2771,573
2,2773,597
2,2779,533
2,2799,558
2,6919,726
2,6919,740
2,6919,741
2,6919,742
2,6919,777
2,7295,761
2,7295,771
2,7295,772
推荐阅读
- selenium - 在空手道黄瓜的输入“场景:”中获取所需的 (...)+ 循环与任何内容都不匹配
- visual-studio-code - 来自 settings.json 的自定义颜色设置(用于 entity.name.function.js)被最新的 VSCode 更新覆盖
- python - 使用 GSPrint/GhostScript 和 Python 进行彩色打印
- python - 数值积分 - Python 中的求和和函数
- javascript - TypeScript 将节点添加到现有 XML
- python - 关闭 matplotlib.pyplot 也会关闭父窗口 Gtk3
- python - 我无法在 python3 shell 中输入字母“e”
- python-3.x - Python3提取多个子串
- python - JSONType 更改的字符串加密类型未保存到数据库
- kubernetes - HELM 升级问题:spec.template.spec.containers[0].volumeMounts[2].name:未找到:“NAME”