首页 > 解决方案 > 在shell脚本中查找数据集之间的持续时间及其平均值

问题描述

这与我的旧问题有关Find the durations and their maximum between the dataset in an interval in shell script

我有一个数据集:

ifile.txt
2
3
2
3
2
20
2
0
2
0
0
2
1
2
5
6
7
0
3
0
3
4
5

我想找出 6 个值区间中 0 个值之间的不同持续时间及其平均值。

我的愿望输出是:

ofile.txt
6 5.33
1 2
1 2
1 2
5 4.2
1 3
3 4

在哪里

6 is the number of counts until next 0 within 6 values (i.e. 2,3,2,3,2,20) and 5.33 is the average value among them;
1 is the number of counts until next 0 within next 6 values (i.e. 2,0,2,0,0,2) and 2 is the average;
Next 1 and 2 are within same 6 values;
5 is the number of counts until next 0 within next 6 values (i.e. 1,2,5,6,7,0) and 4.2 is the average among them;
And so on

根据我上一个问题的答案,我正在尝试这样做:

    awk '
$0!=0{
  count++
  sum=sum+$0
  found=""
}
$0==0{
  print count,max
  count=max=0
  next
}
FNR%6==0{
  print count,max
  count=max=0
  found=1
}
END{
  if(!found){
      print count,max
  }
}
'  Input_file | awk '!/^ /' | awk '$1 != 0'

标签: linuxshellawk

解决方案


EDIT3:再试一次,因为第二组 6 行有2 0 2 0 0 2所以它的输出应该是1 2, 1 2, 0 0,1 2如果是这种情况(我相信理想情况下应该是)然后尝试跟随。

awk '
{
  occur++
}
{
  count=$0!=0?++count:count
  sum+=$0
}
$0==0 || occur==6{
  printf("%d %0.2f\n",count,count?sum/count:prev)
  prev=count?sum/count:0
  prev_count=count
  count=sum=prev=prev_count=""
  if(occur==6){
    occur=""
  }
}
END{
  if(occur){
      printf("%d %0.2f\n",count?count:prev_count,count?sum/count:prev)
  }
}
'  Input_file | awk '$1 != 0'

输出如下:

6 5.33
1 2.00
1 2.00
1 2.00
5 4.20
1 3.00
3 4.00


下面的编辑可能有助于解决与这个实际问题有点不同的类似问题,因此请将它们保留在此处。

EDIT2:如果您不想在 Input_file 中出现零时重新设置计数,请尝试以下操作。这将持续查找仅 6 行并且不会重置其计数。

awk '
{
  occur++
}
$0!=0{
  count++
  sum+=$0
  found=prev_count=prev=""
}
$0==0 && occur!=6{
  printf("%d,%0.2f\n",count?count:prev_count,count?sum/count:prev)
  prev=count?sum/count:0
  prev_count=count
  count=sum=""
  found=1
  next
}
occur==6{
  printf("%d,%0.2f\n",count,count?sum/count:prev)
  prev=count?sum/count:0
  prev_count=count
  count=sum=occur=""
  found=1
}
END{
  if(!found){
      printf("%d,%0.2f\n",count?count:prev_count,count?sum/count:prev)
  }
}
'  Input_file


EDIT1:您能否尝试仅使用提供的示例进行跟踪、测试和编写。

awk '
{
  occur++
}
$0!=0{
  count++
  sum+=$0
  found=prev_count=prev=""
}
$0==0{
  printf("%d,%0.2f\n",count?count:prev_count,count?sum/count:prev)
  prev=count?sum/count:0
  prev_count=count
  count=sum=occur=""
  found=1
  next
}
occur==6{
  printf("%d,%0.2f\n",count,count?sum/count:prev)
  prev=count?sum/count:0
  prev_count=count
  count=sum=occur=""
  found=1
}
END{
  if(!found){
      printf("%d,%0.2f\n",count?count:prev_count,count?sum/count:prev)
  }
}
'  Input_file

代码负责什么:

  • 它负责逻辑,如果任何连续的 2 行具有0值,那么它将打印该行的先前计数和平均值。
  • 这也将处理边缘情况,例如:

    a- 如果一行不是以 a 结尾,0它将检查是否有一些值可以通过found我创建的标志打印。

    b- 如果任何 Input_file 的最后一行未除以 6,那么这种情况也将被 END 块的found标志检查逻辑覆盖。

说明:为上述代码添加详细说明。

awk '                                                                      ##Starting awk program from here.
{
  occur++
}
$0!=0{                                                                     ##Checking condition if a line is NOT having zero value then do following.
  count++                                                                  ##Increment variable count with 1 each time it comes here.
  sum+=$0                                                                  ##Creating variable sum and keep adding current line value in it.
  found=prev_count=prev=""                                                 ##Nullifying variables found, prev_count, prev here.
}                                                                          ##Closing BLOCK for condition $0!=0 here.
$0==0{                                                                     ##Checking condition if a line is having value zero then do following.
  printf("%d,%0.2f\n",count?count:prev_count,count?sum/count:prev)         ##Printing count and count/sum here, making sure later is NOT getting divided by 0 too.
  prev=count?sum/count:0                                                   ##Creating variable prev which will be sum/count or zero in case count variable is NULL.
  prev_count=count                                                         ##Creating variable prev_count whose value is count.
  count=sum=occur=""                                                       ##Nullify variables count and sum here.
  found=1                                                                  ##Setting value 1 to variable found here.
  next                                                                     ##next will skip all further statements from here.
}                                                                          ##Closing BLOCK for condition $0==0 here.
occur==6{                                                                  ##Checking if current line is fully divided with 6 then do following.
  printf("%d,%0.2f\n",count,count?sum/count:prev)                          ##Printing count and count/sum here, making sure later is NOT getting divided by 0 too.
  prev=count?sum/count:0                                                   ##Creating variable prev which will be sum/count or zero in case count variable is NULL.
  prev_count=count                                                         ##Creating variable prev_count whose value is count.
  count=sum=occur=""                                                       ##Nullifying variables count and sum here.
  found=1                                                                  ##Setting value 1 to variable found here.
}                                                                          ##Closing BLOCK for condition FNR%6==0 here.
END{                                                                       ##Starting END block for this awk program here.
  if(!found){                                                              ##Checking condition if variable found is NULL then do following.
      printf("%d,%0.2f\n",count?count:prev_count,count?sum/count:prev)     ##Printing count and count/sum here, making sure later is NOT getting divided by 0 too.
  }
}
'  Input_file                                                                ##Mentioning Input_file name here.

推荐阅读