python - Pandas Rolling vs Scipy kurtosis - 严重的数值不准确
问题描述
首先,我很抱歉我在下面列出的显然不是最小的例子。我完全知道这不符合 SO 的最小可重复性约束,但是,现在已经尝试了几个小时试图重新创建问题,在我看来,它只在至少对数百个值执行计算时才会出现。
我有一个包含数百万个值的数据框,我想在其中滚动计算每列中的峰度。最初我使用pd.rolling.kurt
:
df.rolling(20, min_periods=3).kurt(bias=False)
但注意到该方法存在两个严重问题:
- 准确性不令人满意;尽管 pandas 的方法给出了一个大致可以的结果,但对于我的用例来说,1e-4 数量级的偏差是难以接受的;
- 更令人担忧的是经常“爆炸”的峰度值:不知何故,峰度值突然开始发散到 +/-10,000 秒,完全扭曲了预期的输出。
我创建了三个系列,s1
、s2
和s3
,分别具有 300、600 和 900 个值。(在这篇文章的末尾添加了具有确切值的分配,以免在我的文章之后造成太多麻烦。)这三个系列是数据框一列的切片。切片的创建方式使得最后一个位置是固定的,即s1
具有从N-299
to N
、s2
from N-599
toN
和s3
from N-899
to 的值N
。在这三个系列上运行pd.rolling.kurt
并打印数据帧的尾部(我想谈论的问题出现的地方)给出了以下内容:
>>> s1.rolling(20,min_periods=3).kurt().tail(10)
290 9.591067
291 9.591067
292 9.591067
293 9.591067
294 19.663666
295 14.872262
296 14.147157
297 16.716964
298 7.032522
299 19.983796
>>> s2.rolling(20,min_periods=3).kurt().tail(10)
590 9.591067
591 9.591067
592 9.591067
593 9.591067
594 19.663666
595 14.872262
596 14.147157
597 16.716964
598 7.032522
599 19.983796
>>> s3.rolling(20,min_periods=3).kurt().tail(10)
890 9.591071
891 9.591071
892 9.591071
893 9.591071
894 19.663685
895 15.248361
896 40.444894
897 1368.233241
898 251407.375343
899 902540.031652
我在 Excel 中执行了相同的计算,对于最后十个索引,峰度值应该如下(我使用符号290 / 590 / 890
来节省一些空间:三个输出系列对于索引值 290-299、590-599 具有相同的值,和 890-899):
290 / 590 / 890 9.591067361
291 / 591 / 891 9.591067361
292 / 592 / 892 9.591067361
293 / 593 / 893 9.591067361
294 / 594 / 894 19.66366573
295 / 595 / 895 14.87226197
296 / 596 / 896 14.14715754
297 / 597 / 897 16.7169886
298 / 598 / 898 7.037037037
299 / 599 / 899 20
观察由我们提供的输出,pd.rolling.kurt
我们看到前两个输出是相同的,尽管它们与我使用 Excel 计算的实际输出不匹配。然而,更大的问题发生在第三个输出中,其中值爆炸,好像系列中的值的总数会以某种方式影响峰度值,即使对于所有三种情况,我都使用了 20 的滚动窗口和所需的最小数量of 3. 理论上,如果我的理解是正确的,这意味着除了当前行和最后 19 行之外,没有其他东西应该干扰峰度输出。我很困惑这些“爆炸”值是如何出现的。
然后,我使用 .重新计算了同一系列的峰度值scipy.stats.kurtosis
。这给了我以下输出:
>>> s1.rolling(20,min_periods=3).apply(lambda x: kurtosis(x, bias=False)).tail(10)
290 9.591067
291 9.591067
292 9.591067
293 9.591067
294 19.663666
295 14.872262
296 14.147158
297 16.716989
298 7.037037
299 20.000000
>>> s2.rolling(20,min_periods=3).apply(lambda x: kurtosis(x, bias=False)).tail(10)
590 9.591067
591 9.591067
592 9.591067
593 9.591067
594 19.663666
595 14.872262
596 14.147158
597 16.716989
598 7.037037
599 20.000000
>>> s3.rolling(20,min_periods=3).apply(lambda x: kurtosis(x, bias=False)).tail(10)
890 9.591067
891 9.591067
892 9.591067
893 9.591067
894 19.663666
895 14.872262
896 14.147158
897 16.716989
898 7.037037
899 20.000000
这完美地计算了峰度。然而,.apply(lambda x: kurtosis(x,...)
与矢量化 pandas 方法相比,该结构的效率低得惊人,将整个数据帧的总处理时间从几分钟增加到一个多小时!我完全清楚,在许多情况下,内置的矢量化解决方案往往更喜欢速度而不是数值精度,这可以解释我上面列出的第一个问题;但是,至于第二个问题(即“爆炸”值),我根本看不到任何理由。
有没有什么方法可以有效地计算峰度而不会使值发散并使我的整个输出无效?
系列定义
这是我用来计算上述输出的确切值:
s1 = pd.Series([0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0001499887511247459,-7.499156348433101e-05,-3.699790962233055e-05,-1.899945851585629e-05,-8.999869502079515e-06,-4.999962500264377e-06,-1.999992000039351e-06,-9.999974999814318e-07,-9.999984999603102e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.699983850190338e-05,-8.999878501628346e-06,-3.999972000122605e-06,-1.999992000039351e-06,-9.999974999814318e-07,0.0003669319382432873,-0.0001849488621671012,-9.198730581664589e-05,-4.499687272496313e-05,0.0009075453820856781,0.0004854184782060238,-0.000720221831477389,-0.000359805708801156,-0.0001799514136040646,-8.998785170075082e-05,-5.999640023402946e-05,-1.9999600008734e-05,-6.999954500263924e-06,-1.999995999958864e-06,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001201278176363365,-0.0008013581550363867,-0.0002669288650428971,-8.89921242557729e-05,-2.899914452727788e-05,-9.99990000099588e-06,-2.999989500049026e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0005218638053935734,-0.0004638654873286288,-3.799851806232993e-05,-1.299982450270071e-05,-4.999977500118572e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])
s2 = pd.Series([0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0001499887511247459,-7.499156348433101e-05,-3.699790962233055e-05,-1.899945851585629e-05,-8.999869502079515e-06,-4.999962500264377e-06,-1.999992000039351e-06,-9.999974999814318e-07,-9.999984999603102e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.699983850190338e-05,-8.999878501628346e-06,-3.999972000122605e-06,-1.999992000039351e-06,-9.999974999814318e-07,0.0003669319382432873,-0.0001849488621671012,-9.198730581664589e-05,-4.499687272496313e-05,0.0009075453820856781,0.0004854184782060238,-0.000720221831477389,-0.000359805708801156,-0.0001799514136040646,-8.998785170075082e-05,-5.999640023402946e-05,-1.9999600008734e-05,-6.999954500263924e-06,-1.999995999958864e-06,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001201278176363365,-0.0008013581550363867,-0.0002669288650428971,-8.89921242557729e-05,-2.899914452727788e-05,-9.99990000099588e-06,-2.999989500049026e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0005218638053935734,-0.0004638654873286288,-3.799851806232993e-05,-1.299982450270071e-05,-4.999977500118572e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])
s3 = pd.Series([0.0006613932897393013,0.0002659978876289742,0.000658737582405648,0.0005623339888467145,0.0008417590777197284,0.000542090011101782,0.0007813756301534222,0.0003713395103963933,0.0001847566192768637,0.0005892778635844672,-0.0001955367110279687,0.0004436264576506058,0.000302660947173135,0.0007556577955957223,0.0004099113835531532,0.0002143017625986564,1.052211101549051e-05,6.481751166152551e-05,6.615670911548045e-05,-2.169766854576383e-05,-1.302819997635433e-05,-7.303052044212008e-06,-0.1163297855507419,-0.06335289603465369,-0.03314811069814094,-0.01697505737063765,-0.008591697883893402,-0.004342398361182662,-0.002157940126839023,-0.001100682037128825,-0.0005507856703497119,-0.0002554269710891206,-0.0001277329565522002,-8.395111298446951e-05,-2.189884089509773e-05,-1.094960028496637e-05,-5.479844975342307e-06,-2.739933748392279e-06,-1.369969689294177e-06,-6.799856523827107e-07,-3.399929995978179e-07,-1.79996340600251e-07,-7.999838400850306e-08,-3.999919442393075e-08,-2.999939675042158e-08,-2.007979819879551e-05,-1.004005030070562e-05,-5.52007060169889e-06,-2.760046727695654e-06,9.150125677134498e-06,4.580031464668292e-06,2.2900078662783e-06,1.150001972312828e-06,5.700004873407606e-07,2.80000120302654e-07,1.50000032247295e-07,7.000000733862829e-08,3.000000181016647e-08,2.000000056662899e-08,1.00000003333145e-08,1.000000011126989e-08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0001499887511247459,-7.499156348433101e-05,-3.699790962233055e-05,-1.899945851585629e-05,-8.999869502079515e-06,-4.999962500264377e-06,-1.999992000039351e-06,-9.999974999814318e-07,-9.999984999603102e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.699983850190338e-05,-8.999878501628346e-06,-3.999972000122605e-06,-1.999992000039351e-06,-9.999974999814318e-07,0.0003669319382432873,-0.0001849488621671012,-9.198730581664589e-05,-4.499687272496313e-05,0.0009075453820856781,0.0004854184782060238,-0.000720221831477389,-0.000359805708801156,-0.0001799514136040646,-8.998785170075082e-05,-5.999640023402946e-05,-1.9999600008734e-05,-6.999954500263924e-06,-1.999995999958864e-06,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001201278176363365,-0.0008013581550363867,-0.0002669288650428971,-8.89921242557729e-05,-2.899914452727788e-05,-9.99990000099588e-06,-2.999989500049026e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0005218638053935734,-0.0004638654873286288,-3.799851806232993e-05,-1.299982450270071e-05,-4.999977500118572e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])
解决方案
它看起来像旧版 Pandas 中的一个错误。我可以在 win32、Pandas 1.0.3、numpy 1.15.4 上的旧安装 Python 3.6.2 64 位上重现:
>>> s3.rolling(20,min_periods=3).kurt().tail(10)
890 9.591071
891 9.591071
892 9.591071
893 9.591071
894 19.663685
895 15.248361
896 40.444894
897 1368.233241
898 251407.375343
899 902540.031652
dtype: float64
它似乎已在我的较新版本 Python 3.8.4 64 位、Pandas 1.2.2、numpy 1.20.1 上修复:
>>> s3.rolling(20,min_periods=3).kurt().tail(10)
890 9.591067
891 9.591067
892 9.591067
893 9.591067
894 19.663666
895 14.872262
896 14.147158
897 16.716989
898 7.037037
899 20.000000
dtype: float64
两个安装都在同一台 Windows 10 机器上。
我不能说哪个组件(Pandas 或 numpy)是原因。由于您使用 numpy.stats.kurtosis 的测试给出了正确的结果,我会怀疑 Pandas,但如果没有 Pandas 专家的进一步分析(我不是),我不能肯定。
恕我直言,最合理的解决方案是升级您的系统,或者使用最后可能的 Pandas 版本添加全新的独立 Python 安装。
推荐阅读
- graphql - Github API v4 GraphQL 未按 tagName 返回版本
- php - PHP Prepared Statement SQL with where value
- javascript - 开始输入时显示警报
- node.js - 如何将 google-cloud-auth.json 作为环境变量安全地保存在 app.yaml 中?
- c# - 重新查找新添加的实体
- configuration - Maximo:在地图中显示用户分配的工作流工单
- python - 如何为特定版本的 Python 安装 pip
- sha256 - 生成带有尾随零的 SHA-256 哈希
- javascript - glimmerjs 中反应上下文的等价物是什么?
- docker - 如何启用实验性 Docker CLI 功能