首页 > 解决方案 > 基于时间序列的数据的错误,箱单调增加

问题描述

我正在尝试使用 pd.cut 但收到 bin 必须单调增加的错误。但我没有在代码中提到垃圾箱。

下面是代码。

import pandas as pd
import numpy as np
from scipy.optimize import Bounds
from scipy.optimize import minimize
from google.colab import files
f= files.upload()
df=pd.read_excel(r'symbol1_second.xlsx',index_col='date',parse_dates=True)
df.head()
states=3
def get_edges(df, states):
  edges = {}
  series = df.columns
  for s in series:
    std = df[s].std()
    edges[s] = [std*(-states/2+x) for x in range(states+1)][1:-1]
    edges[s].append(df[s].max())
    edges[s].insert(0,df[s].min())
  return edges
edges = get_edges(df, states) 
g = pd.DataFrame()
for key, value in edges.items():
  g[key] = pd.cut(df[key],value,labels=False, include_lowest=True)

以下是代码中显示的错误

ValueError                                Traceback (most recent call last)
<ipython-input-6-435fbb9cd8dc> in <module>()
     20 g = pd.DataFrame()
     21 for key, value in edges.items():
---> 22   g[key] = pd.cut(df[key],value,labels=False, include_lowest=True)
     23   tm.assert_numpy_array_equal(g[key].codes,
     24                                 np.array([0, 0, 1], dtype="int8"))

/usr/local/lib/python3.6/dist-packages/pandas/core/reshape/tile.py in cut(x, bins, right, labels, retbins, precision, include_lowest, duplicates)
    253         # GH 26045: cast to float64 to avoid an overflow
    254         if (np.diff(bins.astype("float64")) < 0).any():
--> 255             raise ValueError("bins must increase monotonically.")
    256 
    257     fac, bins = _bins_to_cuts(

ValueError: bins must increase monotonically.

标签: pythonpandas

解决方案


这是 pandas pd.cut中的一个错误。尝试将您的工作环境更新到 pandas 版本 0.25.0 并检查一下。


推荐阅读