python - Python - ValueError:找到包含 0 个样本的数组(缩放函数)
问题描述
很长一段时间以来,我一直在努力纠正一个错误。
可以通过删除数据帧的第一行(可能是前两行)来解决这个问题(我认为是这样)。顺便提一句。我在 Google Colab ..x 工作
有谁知道如何解决这个问题?
def preprocess_df(df):
df = df.drop("future", 1)
for col in df.columns:
if col != "target":
df[col] = df[col].pct_change()
df.dropna(inplace=True)
df[col] = preprocessing.scale(df[col].values)
df.dropna(inplace=True)
...
main_df = pd.DataFrame()
ratios = ["EURCZK=X"]
for ratio in ratios:
dataset = f'EURCZK=X/{ratio}.csv'
df = pd.read_csv('EURCZK=X.csv', names=['Date', 'High', 'Low', 'Open', 'Close', 'Volume', 'Adj Close'], skiprows=2)
df.rename(columns={"close": f"{ratio}_close", "volume": f"{ratio}_volume"}, inplace=True)
df.set_index("Date", inplace=True)
df = df[[f"Close", f"Volume"]]
if len(main_df)==0:
main_df = df
else:
main_df = main_df.join(df)
main_df.fillna(method="ffill", inplace=True)
main_df.dropna(inplace=True)
#print(main_df.head())
main_df['future'] = main_df[f'{RATIO_TO_PREDICT}'].shift(-FUTURE_PERIOD_PREDICT)
main_df['target'] = list(map(classify, main_df[f'Close'], main_df['future']))
main_df.dropna(inplace=True)
#print(main_df.tail(10))
Date = sorted(main_df.index.values)
last_5pct = sorted(main_df.index.values)[-int(0.05*len(Date))]
validation_main_df = main_df[(main_df.index >= last_5pct)]
main_df = main_df[(main_df.index < last_5pct)]
print(preprocess_df)
print(df.head)
imputer = imputer(missing_values="NaN", strategy="mean", axis=0)
train_x, train_y = preprocess_df(main_df)
validation_x, validation_y = preprocess_df(validation_main_df) #Preprocess dat
#print(f"train data: {len(train_x)} validation: {len(validation_x)}")
#print(f"Dont buys: {train_y.count(0)}, buys: {train_y.count(1)}")
#print(f"VALIDATION Dont buys: {validation_y.count(0)}, buys: {validation_y.count(1)}")
输出是:
<function preprocess_df at 0x7fc2568ceb70>
<bound method NDFrame.head of Close Volume future target
Date
2003-12-02 32.337502 0.0 32.580002 1
2003-12-03 32.410000 0.0 32.349998 0
2003-12-04 32.580002 0.0 32.020000 0
2003-12-05 32.349998 0.0 32.060001 0
2003-12-08 32.020000 0.0 32.099998 1
... ... ... ... ...
2020-07-28 26.263800 0.0 26.212500 0
2020-07-29 26.196301 0.0 26.238400 1
2020-07-30 26.212500 0.0 26.258400 1
2020-08-02 26.238400 0.0 26.105101 0
2020-08-03 26.258400 0.0 26.228500 0
[4302 rows x 4 columns]>
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-10-49204f0a12cf> in <module>()
80
81 #imputer = imputer(missing_values="NaN", strategy="mean", axis=0)
---> 82 train_x, train_y = preprocess_df(main_df)
83 validation_x, validation_y = preprocess_df(validation_main_df)
84
2 frames
<ipython-input-10-49204f0a12cf> in preprocess_df(df)
28 df[col] = df[col].pct_change()
29 df.dropna(inplace=True)
---> 30 df[col] = preprocessing.scale(df[col].values)
31
32 df.dropna(inplace=True)
/usr/local/lib/python3.6/dist-packages/sklearn/preprocessing/_data.py in scale(X, axis, with_mean, with_std, copy)
140 X = check_array(X, accept_sparse='csc', copy=copy, ensure_2d=False,
141 estimator='the scale function', dtype=FLOAT_DTYPES,
--> 142 force_all_finite='allow-nan')
143 if sparse.issparse(X):
144 if with_mean:
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
584 " minimum of %d is required%s."
585 % (n_samples, array.shape, ensure_min_samples,
--> 586 context))
587
588 if ensure_min_features > 0 and array.ndim == 2:
ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required by the scale function.
当我在第 84 行删除 '#' ("imputer = imputer(missing_values="NaN", strategy="mean", axis=0)") 时,返回答案:'name' imputer 'is not defined'。问题是,我不知道如何定义这个 'Imputer' ..
解决方案
就像 Joe 上面所说的那样,根据传递给imputer
调用的参数,我猜这是这个 scikit-learn 类的实例:
https://scikit-learn.org/0.16/modules/generated/sklearn.preprocessing.Imputer.html
从 scikit-learn 0.20 版开始,这个类现在已经被SimpleImputer
Joe 也发现的类所取代:
https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html
因此,如果您从其他地方获得此代码,则其他来源可能将旧preprocessing.Imputer
类导入为小写imputer
。import sklearn.preprocessing.Imputer as imputer
假设您使用的是 sklearn 版本 <=0.20,您可能可以通过添加到代码顶部来做同样的事情。但是,实例化似乎没有用于上述代码中的任何内容;fit
从来没有被要求过,所以我认为在你所做的地方评论它不会引起问题。(同样,我仅基于共享代码。)
相反,我建议您在main_df
文件被交给preprocess
方法时注意文件的内容。该数据中有一些列 (a pandas.Series
),当它经过pct_change
和dropna
转换时,其中没有剩余值,这就是导致scale
函数耸耸肩的原因。
推荐阅读
- javascript - 如何使用 window.open(parameter) 并且不包含当前网站路径
- java - Spring Boot JPA:将一个实体映射到具有相同列的多个(很多)表
- r - 如何在 R 中的单个 Excel 工作表中附加数据框?
- sql - 如何按月计算列中的值?
- c++ - 在 C++ 中样板化“cold/never_inline”错误处理技术的最佳方法是什么?
- javascript - 是否可以使用 ES6 模板文字而不是 express.js 的视图引擎?
- python - 使用 Tkinter Python 3 时删除基本窗口(Askopenfiledialogue)
- ubuntu - 使用 riscV 构建 gem5 时出现构建错误
- python - 如何在 python 中绘制 pandas timestamps.Timestamp 或 datetime.time
- vagrant - 在哪里报告 Vagrant 盒子的错误?