首页 > 解决方案 > 给定运行时数据,如何知道排序程序是使用冒泡排序还是插入排序?

问题描述

我测量了一个排序程序/算法,并根据运行时数据,我将其缩小为两种排序算法——冒泡排序和插入排序。

有没有办法确定它是哪一个?当然不知道代码。

它们都具有相同的时间复杂度,我没有想法。

时间复杂度数据:

提前致谢!

标签: algorithmsorting

解决方案


  1. 您的 1000 个排序元素太少

    测量的时间太短而不能代表有效的测量(因为大多数时间可能不会被排序本身使用,而是初始化窗口、打开文件等......)。

    您需要至少 100 毫秒或更长的时间(1 秒是理想的)。

  2. 如果您有权访问正在排序的数据

    您可以引入对每种类型的排序都具有挑战性的数据集(并且从时间推断使用的算法)......例如,对于以相反顺序排序的数组来说,冒泡排序是最慢的......所以通过升序和降序和随机的排序数据并比较时间。让我们打电话给时间tasc,tdes,trnd并假设升序排序,那么如果涉及冒泡排序,它应该是:

    tasc O(n) < trnd  < tdes O(n^2)
    

    所以:

    tasc*n == tdes + margin_of error
    

    所以只是测试tdes/tasc接近n......有一些错误余地......

    所以你只需要创建一个样本数据,它对于特定类型的排序而不是其他类型的排序很难......并且从时间开始检测是否是这种情况,直到你发现使用了算法。

    这里有一些数据(所有时间都在[ms])我在我的冒泡排序和 asc 排序数据上测试过:

       n     tasc    tdesc    tasc*n
    1000  0.00321  2.96147  3.205750
    2000  0.00609 11.76799 12.181855
    4000  0.01186 45.58834 47.445111
    

    如果我们有复杂性的运行时更清楚O(n)

    t(O(n)) = c*n
    

    转换为具有复杂性的运行时O(n^2)(假设相同的常量 time c):

    t(O(n^2)) = c*n*n = t(O(n)) * n
    

    通过这种方式,您可以比较具有不同复杂性的时间,您只需将所有测量的时间转换为单个常见的复杂性......

  3. 如果您可以选择排序的数据大小

    然后正如评论中提到的那样,您可以推断时间的增长率随着增加n(加倍)的增加,您可以估计复杂性并从中可以看出使用了哪种算法。

    因此,让我们假设从#2开始测量的时间,那么对于 tasc ( ) O(n),恒定时间c应该是相同的O(n)

       n     tasc    c=tasc/n
    1000  0.00321 0.000003210
    2000  0.00609 0.000003045 
    4000  0.01186 0.000002965 
    

    对于 tdesc ( O(n^2)):

       n     tdesc        tdesc/n^2
    1000   2.96147 0.00000296147000
    2000  11.76799 0.00000294199750
    4000  45.58834 0.00000284927125
    

    正如您所看到的c,这两次都或多或少相同,tasc,tdesc这意味着它们符合估计的复杂性O(n),O(n^2)

然而,如果不知道被测试的应用程序做了什么,就很难确定排序之前可能会进行处理......例如,可能会扫描数据以检测表格(排序的、随机的、几乎排序的......),这是可行O(n)的结果以及数据大小可能会选择使用哪种排序算法...因此您的测量可能会测量不同的例程,从而使结果无效...

[edit1] 我有一个疯狂的想法,即自动检测复杂性

只需通过测试常数时间常数在所有测量时间与它们相应的时间之间是否或多或少相同n......这里简单的C++/VCL代码:

//$$---- Form CPP ----
//---------------------------------------------------------------------------
#include <vcl.h>
#include <math.h>
#pragma hdrstop
#include "Unit1.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
//---------------------------------------------------------------------------
double factorial[]= // n[-],t[ms]
    {
     11,0.008,
     12,0.012,
     13,0.013,
     14,0.014,
     15,0.016,
     16,0.014,
     17,0.015,
     18,0.017,
     19,0.019,
     20,0.016,
     21,0.017,
     22,0.019,
     23,0.021,
     24,0.023,
     25,0.025,
     26,0.027,
     27,0.029,
     28,0.032,
     29,0.034,
     30,0.037,
     31,0.039,
     32,0.034,
     33,0.037,
     34,0.039,
     35,0.041,
     36,0.039,
     37,0.041,
     38,0.044,
     39,0.046,
     40,0.041,
     41,0.044,
     42,0.046,
     43,0.049,
     44,0.048,
     45,0.050,
     46,0.054,
     47,0.056,
     48,0.056,
     49,0.060,
     50,0.063,
     51,0.066,
     52,0.065,
     53,0.069,
     54,0.072,
     55,0.076,
     56,0.077,
     57,0.162,
     58,0.095,
     59,0.093,
     60,0.089,
     61,0.093,
     62,0.098,
     63,0.096,
     64,0.090,
     65,0.100,
     66,0.104,
     67,0.111,
     68,0.100,
     69,0.121,
     70,0.109,
     71,0.119,
     72,0.104,
     73,0.124,
     74,0.113,
     75,0.118,
     76,0.118,
     77,0.123,
     78,0.129,
     79,0.133,
     80,0.121,
     81,0.119,
     82,0.131,
     83,0.150,
     84,0.141,
     85,0.148,
     86,0.154,
     87,0.163,
     88,0.211,
     89,0.151,
     90,0.157,
     91,0.166,
     92,0.161,
     93,0.169,
     94,0.173,
     95,0.188,
     96,0.181,
     97,0.187,
     98,0.194,
     99,0.201,
    100,0.185,
    101,0.191,
    102,0.202,
    103,0.207,
    104,0.242,
    105,0.210,
    106,0.215,
    107,0.221,
    108,0.217,
    109,0.226,
    110,0.232,
    111,0.240,
    112,0.213,
    113,0.231,
    114,0.240,
    115,0.252,
    116,0.248,
    117,0.598,
    118,0.259,
    119,0.261,
    120,0.254,
    121,0.263,
    122,0.270,
    123,0.281,
    124,0.290,
    125,0.322,
    126,0.303,
    127,0.313,
    128,0.307,
      0,0.000
    };
//---------------------------------------------------------------------------
double sort_asc[]=
    {
    1000,0.00321,
    2000,0.00609,
    4000,0.01186,
       0,0.000
    };
//---------------------------------------------------------------------------
double sort_desc[]=
    {
    1000, 2.96147,
    2000,11.76799,
    4000,45.58834,
       0,0.000
    };
//---------------------------------------------------------------------------
double sort_rand[]=
    {
    1000, 3.205750,
    2000,12.181855,
    4000,47.445111,
       0,0.000
    };
//---------------------------------------------------------------------------
double div(double a,double b){ return (fabs(b)>1e-10)?a/b:0.0; }
//---------------------------------------------------------------------------
AnsiString get_complexity(double *dat)  // expect dat[] = { n0,t(n0), n1,t(n1), ... , 0,0 }
    {
    AnsiString O="O(?)";
    int i,e;
    double t,n,c,c0,c1,a,dc=1e+10;
    #define testbeg for (e=1,i=0;dat[i]>0.5;){ n=dat[i]; i++; t=dat[i]; i++;
    #define testend(s) if ((c<=0.0)||(n<2.0)) continue; if (e){ e=0; c0=c; c1=c; } if (c0>c) c0=c; if (c1<c) c1=c; } a=fabs(1.0-div(c0,c1)); if (dc>=a){ dc=a; O=s; }


    testbeg;            c=div(t,n);                 testend("O(n)");
    testbeg;            c=div(t,n*n);               testend("O(n^2)");
    testbeg;            c=div(t,n*n*n);             testend("O(n^3)");
    testbeg;            c=div(t,n*n*n*n);           testend("O(n^4)");
    testbeg; a=log(n);  c=div(t,a);                 testend("O(log(n))");
    testbeg; a=log(n);  c=div(t,a*a);               testend("O(log^2(n))");
    testbeg; a=log(n);  c=div(t,a*a*a);             testend("O(log^3(n))");
    testbeg; a=log(n);  c=div(t,a*a*a*a);           testend("O(log^4(n))");
    testbeg; a=log(n);  c=div(t,n*a);               testend("O(n.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*a);             testend("O(n^2.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a);           testend("O(n^3.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a);         testend("O(n^4.log(n))");
    testbeg; a=log(n);  c=div(t,n*a*a);             testend("O(n.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a);           testend("O(n^2.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a);         testend("O(n^3.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a);       testend("O(n^4.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*a*a*a);           testend("O(n.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a*a);         testend("O(n^2.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a*a);       testend("O(n^3.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a*a);     testend("O(n^4.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*a*a*a*a);         testend("O(n.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a*a*a);       testend("O(n^2.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a*a*a);     testend("O(n^3.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a*a*a);   testend("O(n^4.log^4(n))");

    #undef testend
    #undef testbeg
    return O+AnsiString().sprintf(" error = %.6lf",dc);
    }
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner)
    {
    mm_log->Lines->Clear();
    mm_log->Lines->Add("factorial "+get_complexity(factorial));
    mm_log->Lines->Add("sort asc  "+get_complexity(sort_asc));
    mm_log->Lines->Add("sort desc "+get_complexity(sort_desc));
    mm_log->Lines->Add("sort rand "+get_complexity(sort_rand));
    }
//-------------------------------------------------------------------------

使用我的快速精确 bigint 阶乘的相关时间测量,我只使用了 8ms 以上的较大时间,以及上面输出的排序测量:

factorial O(n.log^2(n)) error = 0.665782
sort asc  O(n) error = 0.076324
sort desc O(n^2) error = 0.037886
sort rand O(n^2) error = 0.075000

该代码仅测试少数支持的复杂性并输出具有最低错误的代码(c不同之间的恒定时间变化n)...

只需忽略 VCL 的东西并将 AnsiString 转换为您想要的任何字符串或输出...


推荐阅读