首页 > 解决方案 > 来自不同数据文件的事件的直方图

问题描述

我的程序模拟的结果是几个数据文件,第一列表示成功(=0)或错误(=1),第二列是以秒为单位的模拟时间。

这两列的一个例子是:

1 185.48736852299064
1 199.44533672989186
1 207.35654106612733
1 213.5214031236177 
1 215.50576147950017
0 219.62444310777695
0 222.26750248416354
0 236.1402270910635 
1 238.5124609287994 
0 246.4538392581228 
.   .
.   .
.   .
1 307.482605596962
1 329.16494123373445
0 329.6454558227778 
1 330.52804695995303
0 332.0673690346546 
0 358.3001385706268 
0 359.82271742496414
1 400.8162129871805 
0 404.88783391725985
1 411.27012219170393

1's我可以制作误差 ( ) 分箱数据的频率图(直方图) 。

set encoding iso_8859_1
set key left top 
set ylabel "P_{error}" 
set xlabel "Time [s]" 
set size 1.4, 1.2
set terminal postscript eps enhanced color "Helvetica" 16 
set grid ytics
set key spacing 1.5
set style fill transparent solid 0.3

`grep '^ 1' lookup-ratio-50-0.0034-50-7-20-10-3-1.txt | awk '{print $2}' > t7.dat`

stats 't7.dat' u 1
set output "t7.eps"
binwidth=2000
bin(x,width)=width*floor(x/width)
plot 't7.dat' using (bin($1,binwidth)):(1.0/STATS_records) smooth freq with boxes lc rgb "midnight-blue" title "7x7_P_error"

结果

在此处输入图像描述

我想改进上面的 Gnuplot 以包含其余的数据文件lookup-.....-.txt及其错误样本,并将它们加入相同的频率图中。

我还想避免使用中间文件,如t7.dat.

此外,我想绘制一条错误概率平均值的水平线。

如何在同一个图中绘制所有样本数据?

问候

标签: shellawkgnuplot

解决方案


如果我理解正确,您想对多个文件进行直方图。所以,你基本上必须连接几个数据文件。当然,您可以使用一些外部程序(如 awk 等)或 shell 命令来执行此操作。以下是 gnuplot 和系统命令的可能解决方案,不需要临时文件。系统命令适用于 Windows,但您可能可以轻松地将其转换为 Linux。也许您需要检查“NaN”值是否不会弄乱您的分箱和直方图结果。

### start code
reset session
# create some dummy data files
do for [i=1:5] {
    set table sprintf("lookup-blahblah_%d.txt", i)
    set samples 50
    plot '+' u (int(rand(0)+0.5)):(rand(0)*0.9+0.1) w table
    unset table
}
# end creating dummy data files

FILELIST = system("dir /B lookup*.txt")   # this is for Windows
print FILELIST

undefine $AllDataWithError
set table $AllDataWithError append
do for [i=1:words(FILELIST)] {
    plot word(FILELIST,i) u ($1==1? $1 : NaN):($1==1? $2 : NaN) w table
}
unset table

print $AllDataWithError

# ... do your binning and plotting
### end of code

编辑:

显然,NaN和/或空行似乎搞砸了smooth freq和/或装箱?!因此,我们只需要提取有错误 (=1) 的行。从上面的代码中,您可以将多个文件合并到一个数据块中。下面的代码已经从一个与您的数据相似的数据块开始。

### start of code
reset session

# create some dummy datablock with some distribution (with no negative values)
Height =3000
Pos = 6000
set table $Data
    set samples 1000
    plot '+' u (int(rand(0)+0.3)):(abs(invnorm(rand(0))*Height+Pos)) w table
unset table
# end creating dummy data

stats $Data nooutput
Datapoints = STATS_records

# get only the error lines
# plot $Data into the table $Dummy.
# If $1==1 (=Error) write the line number $0 into column 1 and value into column 2
# else write NaN into column 1 and column 2.
# Since $0 is the line number which is unique 
# 'smooth frequency' will keep these lines "as is"
# but change the NaN lines to empty lines.
Error = 1
Success = 0
set table $Dummy
    plot $Data u ($1==Error ? $0 : NaN):($1==Error ? $2 : NaN) smooth freq
unset table
# get rid of empty lines in $Dummy
# Since empty lines seem to also mess up binning you need to remove them
# by writing $Dummy into the dataset $Error via "plot ... with table".
set table $Error
   plot $Dummy u 1:2 with table
unset table

bin(x) = binwidth*floor(x/binwidth)
stats $Error nooutput
ErrorCount = STATS_records

set multiplot layout 3,1
set key outside
set label 1 sprintf("Datapoints: %g\nSuccess: %g\nError: %g",\
    Datapoints, Datapoints-ErrorCount,ErrorCount) at graph 1.02, first 0
plot $Data u 0:($1 == Success ? $2 : NaN) w impulses lc rgb "web-green" t "Success",\
    $Data u 0:($1 == Error ? -$2 : NaN) w impulses lc rgb "red" t "Error",\

unset label 1
set key inside
binwidth = 1000
plot $Error using (bin($2)):(1.0/STATS_records) smooth freq with boxes t sprintf("binwidth: %d",binwidth) lc rgb "blue"

binwidth=100
set xrange[GPVAL_X_MIN:GPVAL_X_MAX] # use same xrange as graph before
plot $Error using (bin($2)):(1.0/STATS_records) smooth freq with boxes t sprintf("binwidth: %d",binwidth) lc rgb "magenta"

unset multiplot
### end of code

结果如下: 在此处输入图像描述


推荐阅读